Mercurial > repos > peterjc > nlstradamus
changeset 2:9ec94203d895 draft
Uploaded v0.0.7 with automatic installation of the C++ binary.
author | peterjc |
---|---|
date | Tue, 23 Apr 2013 11:59:14 -0400 |
parents | f93ad4882338 |
children | b2e648e55ed7 |
files | tools/nlstradamus/nlstradamus.txt tools/nlstradamus/nlstradamus.xml tools/nlstradamus/tool_dependencies.xml tools/protein_analysis/nlstradamus.txt tools/protein_analysis/nlstradamus.xml |
diffstat | 5 files changed, 223 insertions(+), 185 deletions(-) [+] |
line wrap: on
line diff
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tools/nlstradamus/nlstradamus.txt Tue Apr 23 11:59:14 2013 -0400 @@ -0,0 +1,116 @@ +Galaxy wrapper for NLStradamus v1.7 or v1.8 (C++ version) +========================================================= + +This wrapper is copyright 2011-2013 by Peter Cock, The James Hutton Institute +(formerly SCRI, Scottish Crop Research Institute), UK. All rights reserved. +See the licence text below. + +NLStradamus is a command line tools for predicting nuclear localization +signals (NLSs) in a FASTA file of proteins using a Hidden Markov Model (HMM). + +A. N. Nguyen Ba, A. Pogoutse, N. Provart, A. M. Moses. +NLStradamus: a simple Hidden Markov Model for nuclear localization signal prediction. +BMC Bioinformatics. 2009 Jun 29;10(1):202. +http://dx.doi.org/10.1186/1471-2105-10-202 + +http://www.moseslab.csb.utoronto.ca/NLStradamus + +Early versions of NLStradamus did not have a native tabular output format, this +was added in version 1.7. Additionally a fast C++ implementation was added at +this point (early versions of NLStradamus came as a perl script only). + +Version 1.8 fixed a C++ compilation issue on modern compilers, but is otherwise +unchanged. + + +Automated Installation +====================== + +This should be straightforward, Galaxy should automatically download and install +the C++ implementation of NLStradamus v1.8, and run the unit tests. + + +Manual Installation +=================== +This wrapper expects the compiled C++ binary "NLStradamus" to be on the system +PATH. + +To install the wrapper copy or move the following files under the Galaxy tools +folder, e.g. in a tools/protein_analysis folder: + +* nlstradamus.xml (the Galaxy tool definition) +* nlstradamus.txt (this README file) + +You will also need to modify the tools_conf.xml file to tell Galaxy to offer the +tool. If you are using other protein analysis tools like TMHMM or SignalP, put +it next to them. Just add the line (matching the chosen install path): + +<tool file="protein_analysis/nlstradamus.xml" /> + +If you wish to run the unit tests, also add this to tools_conf.xml.sample +and move/copy the test-data files under Galaxy's test-data folder. Then: + +$ ./run_functional_tests.sh -id nlstradamus + +That's it. + + +History +======= + +v0.0.3 - Initial public release +v0.0.4 - Adding DOI link to reference + (Documentation change only) +v0.0.5 - Assume non-zero return codes are errors +v0.0.6 - Show output help text using a table + - Added unit tests +v0.0.7 - Automatic installation of the NLStradamus binary when installed + via the Galaxy Tool Shed + + +Developers +========== + +This script and related tools are being developed on the following hg branch: +http://bitbucket.org/peterjc/galaxy-central/src/tools + +For making the "Galaxy Tool Shed" http://toolshed.g2.bx.psu.edu/ tarball use +the following command from the Galaxy root folder: + +$ tar -czf nlstradmus.tar.gz tools/nlstradamus/nlstradamus.xml tools/nlstradamus/nlstradamus.txt tools/nlstradamus/tool_dependencies.xml test-data/four_human_proteins.fasta test-data/four_human_proteins.nlstradamus.tabular test-data/empty.fasta test-data/empty_nlstradamus.tabular + +Check this worked: + +$ tar -tzf nlstradmus.tar.gz +tools/nlstradamus/nlstradamus.xml +tools/nlstradamus/nlstradamus.txt +tools/nlstradamus/tool_dependencies.xml +test-data/four_human_proteins.fasta +test-data/four_human_proteins.nlstradamus.tabular +test-data/empty.fasta +test-data/empty_nlstradamus.tabular + + +Licence (MIT/BSD style) +======================= + +Permission to use, copy, modify, and distribute this software and its +documentation with or without modifications and for any purpose and +without fee is hereby granted, provided that any copyright notices +appear in all copies and that both those copyright notices and this +permission notice appear in supporting documentation, and that the +names of the contributors or copyright holders not be used in +advertising or publicity pertaining to distribution of the software +without specific prior permission. + +THE CONTRIBUTORS AND COPYRIGHT HOLDERS OF THIS SOFTWARE DISCLAIM ALL +WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ALL IMPLIED +WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT SHALL THE +CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY SPECIAL, INDIRECT +OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS +OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE +OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE +OR PERFORMANCE OF THIS SOFTWARE. + +NOTE: This is the licence for the Galaxy Wrapper only. NLStradamus +is is available and licenced separately (under the GPL v3 or later).
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tools/nlstradamus/nlstradamus.xml Tue Apr 23 11:59:14 2013 -0400 @@ -0,0 +1,82 @@ +<tool id="nlstradamus" name="NLStradamus" version="0.0.7"> + <description>Find nuclear localization signals (NLSs) in protein sequences</description> + <command> + NLStradamus -i $fasta_file -t $threshold -m $model -a $algorithm -tab > $tabular_file + </command> + <stdio> + <!-- Assume anything other than zero is an error --> + <exit_code range="1:" /> + <exit_code range=":-1" /> + </stdio> + <inputs> + <param name="fasta_file" type="data" format="fasta" label="FASTA file of protein sequences"/> + <param name="model" type="select" display="radio" label="Model"> + <option value="1" selected="True">Two state</option> + <option value="2">Four state</option> + </param> + <param name="algorithm" type="select" display="radio" label="Algorithm"> + <option value="0">Viterbi</option> + <option value="1" selected="True">Posterior with threshold</option> + <option value="2">Both</option> + </param> + <param name="threshold" type="float" label="Posterior theshold" value="0.6"> + <validator type="in_range" min="0" max="1" message="Threshold value should be between 0 and 1."/> + </param> + </inputs> + <outputs> + <data name="tabular_file" format="tabular" label="NLStradamus results" /> + </outputs> + <requirements> + <requirement type="binary">NLStradamus</requirement> + <requirement type="package" version="1.8">NLStradamus</requirement> + </requirements> + <tests> + <test> + <param name="fasta_file" value="four_human_proteins.fasta" ftype="fasta" /> + <param name="model" value="1" /> + <param name="algorithm" value="1" /> + <param name="threshold" value="0.6" /> + <output name="tabular_file" file="four_human_proteins.nlstradamus.tabular" ftype="tabular" /> + </test> + <test> + <param name="fasta_file" value="empty.fasta" ftype="fasta" /> + <param name="model" value="2" /> + <param name="algorithm" value="2" /> + <param name="threshold" value="0.125"/> + <output name="tabular_file" file="empty_nlstradamus.tabular" ftype="tabular" /> + </test> + </tests> + <help> + +**What it does** + +This calls the NLStradamus tool for prediction of nuclear localization +signals (NLSs), which uses a Hidden Markov Model (HMM). + +The input is a FASTA file of protein sequences, and the output is tabular +with six columns (one row per NLS): + +====== =================================================================== +Column Description +------ ------------------------------------------------------------------- + c1 Sequence identifier + c2 Algorithm (posterior or Viterbi) + c3 Score (probability between threshold and 1 for posterior algorithm) + c4 Start + c5 End + c6 Sequence of NLS +====== =================================================================== + +----- + +**References** + +A. N. Nguyen Ba, A. Pogoutse, N. Provart, A. M. Moses. +NLStradamus: a simple Hidden Markov Model for nuclear localization signal prediction. +BMC Bioinformatics. 2009 Jun 29;10(1):202. +http://dx.doi.org/10.1186/1471-2105-10-202 + +http://www.moseslab.csb.utoronto.ca/NLStradamus + + </help> +</tool>
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tools/nlstradamus/tool_dependencies.xml Tue Apr 23 11:59:14 2013 -0400 @@ -0,0 +1,25 @@ +<?xml version="1.0"?> +<tool_dependency> + <package name="NLStradamus" version="1.8"> + <install version="1.0"> + <actions> + <action type="download_by_url">http://www.moseslab.csb.utoronto.ca/NLStradamus/NLStradamus/NLStradamus.1.8.tar.gz</action> + <!-- Although v1.7 used a subfolder in the tar-ball, v1.8 did not --> + <action type="shell_command">g++ NLStradamus.cpp -o NLStradamus -O3</action> + <action type="move_file"><source>NLStradamus</source><destination>$INSTALL_DIR/</destination></action> + <action type="set_environment"> + <environment_variable name="PATH" action="prepend_to">$INSTALL_DIR</environment_variable> + </action> + </actions> + </install> + <readme> +This downloads NLStradamus v1.8 from this folder, +http://www.moseslab.csb.utoronto.ca/NLStradamus/NLStradamus/ + +The C++ tool is compiled as described in the README_C.txt file, using g++, and included in the $PATH. + +The older slower Perl implementation is not installed. + </readme> + </package> +</tool_dependency> +
--- a/tools/protein_analysis/nlstradamus.txt Wed Apr 17 08:26:25 2013 -0400 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,104 +0,0 @@ -Galaxy wrapper for NLStradamus v1.7 or v1.8 (C++ version) -========================================================= - -This wrapper is copyright 2011-2013 by Peter Cock, The James Hutton Institute -(formerly SCRI, Scottish Crop Research Institute), UK. All rights reserved. -See the licence text below. - -NLStradamus is a command line tools for predicting nuclear localization -signals (NLSs) in a FASTA file of proteins using a Hidden Markov Model (HMM). - -A. N. Nguyen Ba, A. Pogoutse, N. Provart, A. M. Moses. -NLStradamus: a simple Hidden Markov Model for nuclear localization signal prediction. -BMC Bioinformatics. 2009 Jun 29;10(1):202. -http://dx.doi.org/10.1186/1471-2105-10-202 - -http://www.moseslab.csb.utoronto.ca/NLStradamus - -Early versions of NLStradamus did not have a native tabular output format, this -was added in version 1.7. Additionally a fast C++ implementation was added at -this point (early versions of NLStradamus came as a perl script only). - -Version 1.8 fixed a C++ compilation issue on modern compilers, but is otherwise -unchanged. - - -Installation -============ -This wrapper expects the compiled C++ binary "NLStradamus" to be on the system -PATH. - -To install the wrapper copy or move the following files under the Galaxy tools -folder, e.g. in a tools/protein_analysis folder: - -* nlstradamus.xml (the Galaxy tool definition) -* nlstradamus.txt (this README file) - -You will also need to modify the tools_conf.xml file to tell Galaxy to offer the -tool. If you are using other protein analysis tools like TMHMM or SignalP, put -it next to them. Just add the line: - -<tool file="protein_analysis/nlstradamus.xml" /> - -If you wish to run the unit tests, also add this to tools_conf.xml.sample -and move/copy the test-data files under Galaxy's test-data folder. - -That's it. - - -History -======= - -v0.0.3 - Initial public release -v0.0.4 - Adding DOI link to reference - (Documentation change only) -v0.0.5 - Assume non-zero return codes are errors -v0.0.6 - Show output help text using a table - - Added unit tests - - -Developers -========== - -This script and related tools are being developed on the following hg branch: -http://bitbucket.org/peterjc/galaxy-central/src/tools - -For making the "Galaxy Tool Shed" http://toolshed.g2.bx.psu.edu/ tarball use -the following command from the Galaxy root folder: - -$ tar -czf nlstradmus.tar.gz tools/protein_analysis/nlstradamus.xml tools/protein_analysis/nlstradamus.txt test-data/four_human_proteins.fasta test-data/four_human_proteins.nlstradamus.tabular test-data/empty.fasta test-data/empty_nlstradamus.tabular - -Check this worked: - -$ tar -tzf nlstradmus.tar.gz -tools/protein_analysis/nlstradamus.xml -tools/protein_analysis/nlstradamus.txt -test-data/four_human_proteins.fasta -test-data/four_human_proteins.nlstradamus.tabular -test-data/empty.fasta -test-data/empty_nlstradamus.tabular - - -Licence (MIT/BSD style) -======================= - -Permission to use, copy, modify, and distribute this software and its -documentation with or without modifications and for any purpose and -without fee is hereby granted, provided that any copyright notices -appear in all copies and that both those copyright notices and this -permission notice appear in supporting documentation, and that the -names of the contributors or copyright holders not be used in -advertising or publicity pertaining to distribution of the software -without specific prior permission. - -THE CONTRIBUTORS AND COPYRIGHT HOLDERS OF THIS SOFTWARE DISCLAIM ALL -WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ALL IMPLIED -WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT SHALL THE -CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY SPECIAL, INDIRECT -OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS -OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE -OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE -OR PERFORMANCE OF THIS SOFTWARE. - -NOTE: This is the licence for the Galaxy Wrapper only. NLStradamus -is available and licenced separately.
--- a/tools/protein_analysis/nlstradamus.xml Wed Apr 17 08:26:25 2013 -0400 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,81 +0,0 @@ -<tool id="nlstradamus" name="NLStradamus" version="0.0.6"> - <description>Find nuclear localization signals (NLSs) in protein sequences</description> - <command> - NLStradamus -i $fasta_file -t $threshold -m $model -a $algorithm -tab > $tabular_file - </command> - <stdio> - <!-- Assume anything other than zero is an error --> - <exit_code range="1:" /> - <exit_code range=":-1" /> - </stdio> - <inputs> - <param name="fasta_file" type="data" format="fasta" label="FASTA file of protein sequences"/> - <param name="model" type="select" display="radio" label="Model"> - <option value="1" selected="True">Two state</option> - <option value="2">Four state</option> - </param> - <param name="algorithm" type="select" display="radio" label="Algorithm"> - <option value="0">Viterbi</option> - <option value="1" selected="True">Posterior with threshold</option> - <option value="2">Both</option> - </param> - <param name="threshold" type="float" label="Posterior theshold" value="0.6"> - <validator type="in_range" min="0" max="1" message="Threshold value should be between 0 and 1."/> - </param> - </inputs> - <outputs> - <data name="tabular_file" format="tabular" label="NLStradamus results" /> - </outputs> - <requirements> - <requirement type="binary">NLStradamus</requirement> - </requirements> - <tests> - <test> - <param name="fasta_file" value="four_human_proteins.fasta" ftype="fasta" /> - <param name="model" value="1" /> - <param name="algorithm" value="1" /> - <param name="threshold" value="0.6" /> - <output name="tabular_file" file="four_human_proteins.nlstradamus.tabular" ftype="tabular" /> - </test> - <test> - <param name="fasta_file" value="empty.fasta" ftype="fasta" /> - <param name="model" value="2" /> - <param name="algorithm" value="2" /> - <param name="threshold" value="0.125"/> - <output name="tabular_file" file="empty_nlstradamus.tabular" ftype="tabular" /> - </test> - </tests> - <help> - -**What it does** - -This calls the NLStradamus tool for prediction of nuclear localization -signals (NLSs), which uses a Hidden Markov Model (HMM). - -The input is a FASTA file of protein sequences, and the output is tabular -with six columns (one row per NLS): - -====== =================================================================== -Column Description ------- ------------------------------------------------------------------- - c1 Sequence identifier - c2 Algorithm (posterior or Viterbi) - c3 Score (probability between threshold and 1 for posterior algorithm) - c4 Start - c5 End - c6 Sequence of NLS -====== =================================================================== - ------ - -**References** - -A. N. Nguyen Ba, A. Pogoutse, N. Provart, A. M. Moses. -NLStradamus: a simple Hidden Markov Model for nuclear localization signal prediction. -BMC Bioinformatics. 2009 Jun 29;10(1):202. -http://dx.doi.org/10.1186/1471-2105-10-202 - -http://www.moseslab.csb.utoronto.ca/NLStradamus - - </help> -</tool>