# HG changeset patch # User peterjc # Date 1366201585 14400 # Node ID f93ad488233841c3753520de4df35132637d4e33 # Parent 0ad90e5eb39026a6d0585121df5c85f1705546ee Uploaded v0.0.6, adds unit tests and minor documentation changes. diff -r 0ad90e5eb390 -r f93ad4882338 test-data/empty.fasta --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/test-data/empty.fasta Wed Apr 17 08:26:25 2013 -0400 @@ -0,0 +1,2 @@ + + diff -r 0ad90e5eb390 -r f93ad4882338 test-data/empty_nlstradamus.tabular --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/test-data/empty_nlstradamus.tabular Wed Apr 17 08:26:25 2013 -0400 @@ -0,0 +1,1 @@ +#ID algorithm score start stop sequence diff -r 0ad90e5eb390 -r f93ad4882338 test-data/four_human_proteins.fasta --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/test-data/four_human_proteins.fasta Wed Apr 17 08:26:25 2013 -0400 @@ -0,0 +1,61 @@ +>sp|Q9BS26|ERP44_HUMAN Endoplasmic reticulum resident protein 44 OS=Homo sapiens GN=ERP44 PE=1 SV=1 +MHPAVFLSLPDLRCSLLLLVTWVFTPVTTEITSLDTENIDEILNNADVALVNFYADWCRF +SQMLHPIFEEASDVIKEEFPNENQVVFARVDCDQHSDIAQRYRISKYPTLKLFRNGMMMK +REYRGQRSVKALADYIRQQKSDPIQEIRDLAEITTLDRSKRNIIGYFEQKDSDNYRVFER +VANILHDDCAFLSAFGDVSKPERYSGDNIIYKPPGHSAPDMVYLGAMTNFDVTYNWIQDK +CVPLVREITFENGEELTEEGLPFLILFHMKEDTESLEIFQNEVARQLISEKGTINFLHAD +CDKFRHPLLHIQKTPADCPVIAIDSFRHMYVFGDFKDVLIPGKLKQFVFDLHSGKLHREF +HHGPDPTDTAPGEQAQDVASSPPESSFQKLAPSEYRYTLLRDRDEL +>sp|Q9NSY1|BMP2K_HUMAN BMP-2-inducible protein kinase OS=Homo sapiens GN=BMP2K PE=1 SV=2 +MKKFSRMPKSEGGSGGGAAGGGAGGAGAGAGCGSGGSSVGVRVFAVGRHQVTLEESLAEG +GFSTVFLVRTHGGIRCALKRMYVNNMPDLNVCKREITIMKELSGHKNIVGYLDCAVNSIS +DNVWEVLILMEYCRAGQVVNQMNKKLQTGFTEPEVLQIFCDTCEAVARLHQCKTPIIHRD +LKVENILLNDGGNYVLCDFGSATNKFLNPQKDGVNVVEEEIKKYTTLSYRAPEMINLYGG +KPITTKADIWALGCLLYKLCFFTLPFGESQVAICDGNFTIPDNSRYSRNIHCLIRFMLEP +DPEHRPDIFQVSYFAFKFAKKDCPVSNINNSSIPSALPEPMTASEAAARKSQIKARITDT +IGPTETSIAPRQRPKANSATTATPSVLTIQSSATPVKVLAPGEFGNHRPKGALRPGNGPE +ILLGQGPPQQPPQQHRVLQQLQQGDWRLQQLHLQHRHPHQQQQQQQQQQQQQQQQQQQQQ +QQQQQQHHHHHHHHLLQDAYMQQYQHATQQQQMLQQQFLMHSVYQPQPSASQYPTMMPQY +QQAFFQQQMLAQHQPSQQQASPEYLTSPQEFSPALVSYTSSLPAQVGTIMDSSYSANRSV +ADKEAIANFTNQKNISNPPDMSGWNPFGEDNFSKLTEEELLDREFDLLRSNRLEERASSD +KNVDSLSAPHNHPPEDPFGSVPFISHSGSPEKKAEHSSINQENGTANPIKNGKTSPASKD +QRTGKKTSVQGQVQKGNDESESDFESDPPSPKSSEEEEQDDEEVLQGEQGDFNDDDTEPE +NLGHRPLLMDSEDEEEEEKHSSDSDYEQAKAKYSDMSSVYRDRSGSGPTQDLNTILLTSA +QLSSDVAVETPKQEFDVFGAVPFFAVRAQQPQQEKNEKNLPQHRFPAAGLEQEEFDVFTK +APFSKKVNVQECHAVGPEAHTIPGYPKSVDVFGSTPFQPFLTSTSKSESNEDLFGLVPFD +EITGSQQQKVKQRSLQKLSSRQRRTKQDMSKSNGKRHHGTPTSTKKTLKPTYRTPERARR +HKKVGRRDSQSSNEFLTISDSKENISVALTDGKDRGNVLQPEESLLDPFGAKPFHSPDLS +WHPPHQGLSDIRADHNTVLPGRPRQNSLHGSFHSADVLKMDDFGAVPFTELVVQSITPHQ +SQQSQPVELDPFGAAPFPSKQ +>sp|P06213|INSR_HUMAN Insulin receptor OS=Homo sapiens GN=INSR PE=1 SV=4 +MATGGRRGAAAAPLLVAVAALLLGAAGHLYPGEVCPGMDIRNNLTRLHELENCSVIEGHL +QILLMFKTRPEDFRDLSFPKLIMITDYLLLFRVYGLESLKDLFPNLTVIRGSRLFFNYAL +VIFEMVHLKELGLYNLMNITRGSVRIEKNNELCYLATIDWSRILDSVEDNYIVLNKDDNE +ECGDICPGTAKGKTNCPATVINGQFVERCWTHSHCQKVCPTICKSHGCTAEGLCCHSECL +GNCSQPDDPTKCVACRNFYLDGRCVETCPPPYYHFQDWRCVNFSFCQDLHHKCKNSRRQG +CHQYVIHNNKCIPECPSGYTMNSSNLLCTPCLGPCPKVCHLLEGEKTIDSVTSAQELRGC +TVINGSLIINIRGGNNLAAELEANLGLIEEISGYLKIRRSYALVSLSFFRKLRLIRGETL +EIGNYSFYALDNQNLRQLWDWSKHNLTITQGKLFFHYNPKLCLSEIHKMEEVSGTKGRQE +RNDIALKTNGDQASCENELLKFSYIRTSFDKILLRWEPYWPPDFRDLLGFMLFYKEAPYQ +NVTEFDGQDACGSNSWTVVDIDPPLRSNDPKSQNHPGWLMRGLKPWTQYAIFVKTLVTFS +DERRTYGAKSDIIYVQTDATNPSVPLDPISVSNSSSQIILKWKPPSDPNGNITHYLVFWE +RQAEDSELFELDYCLKGLKLPSRTWSPPFESEDSQKHNQSEYEDSAGECCSCPKTDSQIL +KELEESSFRKTFEDYLHNVVFVPRKTSSGTGAEDPRPSRKRRSLGDVGNVTVAVPTVAAF +PNTSSTSVPTSPEEHRPFEKVVNKESLVISGLRHFTGYRIELQACNQDTPEERCSVAAYV +SARTMPEAKADDIVGPVTHEIFENNVVHLMWQEPKEPNGLIVLYEVSYRRYGDEELHLCV +SRKHFALERGCRLRGLSPGNYSVRIRATSLAGNGSWTEPTYFYVTDYLDVPSNIAKIIIG +PLIFVFLFSVVIGSIYLFLRKRQPDGPLGPLYASSNPEYLSASDVFPCSVYVPDEWEVSR +EKITLLRELGQGSFGMVYEGNARDIIKGEAETRVAVKTVNESASLRERIEFLNEASVMKG +FTCHHVVRLLGVVSKGQPTLVVMELMAHGDLKSYLRSLRPEAENNPGRPPPTLQEMIQMA +AEIADGMAYLNAKKFVHRDLAARNCMVAHDFTVKIGDFGMTRDIYETDYYRKGGKGLLPV +RWMAPESLKDGVFTTSSDMWSFGVVLWEITSLAEQPYQGLSNEQVLKFVMDGGYLDQPDN +CPERVTDLMRMCWQFNPKMRPTFLEIVNLLKDDLHPSFPEVSFFHSEENKAPESEELEME +FEDMENVPLDRSSHCQREEAGGRDGGSSLGFKRSYEEHIPYTHMNGGKKNGRILTLPRSN +PS +>sp|P08100|OPSD_HUMAN Rhodopsin OS=Homo sapiens GN=RHO PE=1 SV=1 +MNGTEGPNFYVPFSNATGVVRSPFEYPQYYLAEPWQFSMLAAYMFLLIVLGFPINFLTLY +VTVQHKKLRTPLNYILLNLAVADLFMVLGGFTSTLYTSLHGYFVFGPTGCNLEGFFATLG +GEIALWSLVVLAIERYVVVCKPMSNFRFGENHAIMGVAFTWVMALACAAPPLAGWSRYIP +EGLQCSCGIDYYTLKPEVNNESFVIYMFVVHFTIPMIIIFFCYGQLVFTVKEAAAQQQES +ATTQKAEKEVTRMVIIMVIAFLICWVPYASVAFYIFTHQGSNFGPIFMTIPAFFAKSAAI +YNPVIYIMMNKQFRNCMLTTICCGKNPLGDDEASATVSKTETSQVAPA diff -r 0ad90e5eb390 -r f93ad4882338 test-data/four_human_proteins.nlstradamus.tabular --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/test-data/four_human_proteins.nlstradamus.tabular Wed Apr 17 08:26:25 2013 -0400 @@ -0,0 +1,2 @@ +#ID algorithm score start stop sequence +sp|Q9NSY1|BMP2K_HUMAN posterior 0.945 983 1027 RRTKQDMSKSNGKRHHGTPTSTKKTLKPTYRTPERARRHKKVGRR diff -r 0ad90e5eb390 -r f93ad4882338 tools/protein_analysis/nlstradamus.txt --- a/tools/protein_analysis/nlstradamus.txt Tue Jun 07 17:39:58 2011 -0400 +++ b/tools/protein_analysis/nlstradamus.txt Wed Apr 17 08:26:25 2013 -0400 @@ -1,7 +1,7 @@ -Galaxy wrapper for NLStradamus v1.7 (C++ version) -================================================= +Galaxy wrapper for NLStradamus v1.7 or v1.8 (C++ version) +========================================================= -This wrapper is copyright 2011 by Peter Cock, The James Hutton Institute +This wrapper is copyright 2011-2013 by Peter Cock, The James Hutton Institute (formerly SCRI, Scottish Crop Research Institute), UK. All rights reserved. See the licence text below. @@ -11,15 +11,24 @@ A. N. Nguyen Ba, A. Pogoutse, N. Provart, A. M. Moses. NLStradamus: a simple Hidden Markov Model for nuclear localization signal prediction. BMC Bioinformatics. 2009 Jun 29;10(1):202. +http://dx.doi.org/10.1186/1471-2105-10-202 http://www.moseslab.csb.utoronto.ca/NLStradamus Early versions of NLStradamus did not have a native tabular output format, this was added in version 1.7. Additionally a fast C++ implementation was added at -this point (early versions of NLStradamus came as a perl script only). This -wrapper expects the compiled C++ binary "NLStradamus" to be on the system PATH. +this point (early versions of NLStradamus came as a perl script only). + +Version 1.8 fixed a C++ compilation issue on modern compilers, but is otherwise +unchanged. + -To install the wrapper installed the following files under the Galaxy tools +Installation +============ +This wrapper expects the compiled C++ binary "NLStradamus" to be on the system +PATH. + +To install the wrapper copy or move the following files under the Galaxy tools folder, e.g. in a tools/protein_analysis folder: * nlstradamus.xml (the Galaxy tool definition) @@ -31,6 +40,9 @@ +If you wish to run the unit tests, also add this to tools_conf.xml.sample +and move/copy the test-data files under Galaxy's test-data folder. + That's it. @@ -38,6 +50,11 @@ ======= v0.0.3 - Initial public release +v0.0.4 - Adding DOI link to reference + (Documentation change only) +v0.0.5 - Assume non-zero return codes are errors +v0.0.6 - Show output help text using a table + - Added unit tests Developers @@ -46,17 +63,20 @@ This script and related tools are being developed on the following hg branch: http://bitbucket.org/peterjc/galaxy-central/src/tools -For making the "Galaxy Tool Shed" http://community.g2.bx.psu.edu/ tarball use +For making the "Galaxy Tool Shed" http://toolshed.g2.bx.psu.edu/ tarball use the following command from the Galaxy root folder: -tar -czf nlstradmus.tar.gz tools/protein_analysis/nlstradum.xml tools/protein_analysis/nlstradum.txt +$ tar -czf nlstradmus.tar.gz tools/protein_analysis/nlstradamus.xml tools/protein_analysis/nlstradamus.txt test-data/four_human_proteins.fasta test-data/four_human_proteins.nlstradamus.tabular test-data/empty.fasta test-data/empty_nlstradamus.tabular Check this worked: $ tar -tzf nlstradmus.tar.gz -filter/seq_filter_by_id.py -filter/seq_filter_by_id.txt -filter/seq_filter_by_id.xml +tools/protein_analysis/nlstradamus.xml +tools/protein_analysis/nlstradamus.txt +test-data/four_human_proteins.fasta +test-data/four_human_proteins.nlstradamus.tabular +test-data/empty.fasta +test-data/empty_nlstradamus.tabular Licence (MIT/BSD style) diff -r 0ad90e5eb390 -r f93ad4882338 tools/protein_analysis/nlstradamus.xml --- a/tools/protein_analysis/nlstradamus.xml Tue Jun 07 17:39:58 2011 -0400 +++ b/tools/protein_analysis/nlstradamus.xml Wed Apr 17 08:26:25 2013 -0400 @@ -1,8 +1,13 @@ - + Find nuclear localization signals (NLSs) in protein sequences NLStradamus -i $fasta_file -t $threshold -m $model -a $algorithm -tab > $tabular_file + + + + + @@ -25,6 +30,20 @@ NLStradamus + + + + + + + + + + + + + + @@ -36,12 +55,16 @@ The input is a FASTA file of protein sequences, and the output is tabular with six columns (one row per NLS): - * Sequence identifier - * Algorithm (posterior or Viterbi) - * Score (probability between threshold and 1 for posterior algorithm) - * Start - * End - * Sequence of NLS +====== =================================================================== +Column Description +------ ------------------------------------------------------------------- + c1 Sequence identifier + c2 Algorithm (posterior or Viterbi) + c3 Score (probability between threshold and 1 for posterior algorithm) + c4 Start + c5 End + c6 Sequence of NLS +====== =================================================================== ----- @@ -50,6 +73,7 @@ A. N. Nguyen Ba, A. Pogoutse, N. Provart, A. M. Moses. NLStradamus: a simple Hidden Markov Model for nuclear localization signal prediction. BMC Bioinformatics. 2009 Jun 29;10(1):202. +http://dx.doi.org/10.1186/1471-2105-10-202 http://www.moseslab.csb.utoronto.ca/NLStradamus