# HG changeset patch # User bgruening # Date 1373121433 14400 # Node ID d34f31cbc9ddba7c9bdbe9253ec644a4c62bd9e5 Uploaded diff -r 000000000000 -r d34f31cbc9dd aragorn.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/aragorn.xml Sat Jul 06 10:37:13 2013 -0400 @@ -0,0 +1,180 @@ + + prediction (Aragon) + + aragorn + + + aragorn + $input + -gc$genbank_gencode + $tmRNA + $tRNA + $mtRNA + $mam_mtRNA + $topology + -o $output + $secondary_structure + $introns + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +**What it does** + +Aragorn_ predicts tRNA (and tmRNA) in nucleotide sequences. + +.. _Aragorn: http://mbio-serv2.mbioekol.lu.se/ARAGORN/ + +----- + +**Example** + +Suppose you have the following nucleotide sequences:: + + >SQ Sequence 8667507 BP; 1203558 A; 3121252 C; 3129638 G; 1213059 T; 0 other; + cccgcggagcgggtaccacatcgctgcgcgatgtgcgagcgaacacccgggctgcgcccg + ggtgttgcgctcccgctccgcgggagcgctggcgggacgctgcgcgtcccgctcaccaag + cccgcttcgcgggcttggtgacgctccgtccgctgcgcttccggagttgcggggcttcgc + cccgctaaccctgggcctcgcttcgctccgccttgggcctgcggcgggtccgctgcgctc + ccccgcctcaagggcccttccggctgcgcctccaggacccaaccgcttgcgcgggcctgg + .... + +Running this tool can produce a FASTA file with all detected RNAs or a more detailed text file like the following:: + + c + c + a + g-c + g-c + g-c + c-g + g-c + a-t + t-a ca + t tgacc a + ga a !!!!! g + t ctcg actgg c + g !!!! c tt + g gagc t + aa g g + c-gag + t-a + t-a + c-g + g-c + t c + t a + cac + + tRNA-Val(cac) + 74 bases, %GC = 58.1 + Sequence [6669703,6669776] + + + tRNA Anticodon Frequency + AAA Phe GAA Phe 1 CAA Leu 1 TAA Leu 1 + AGA Ser GGA Ser 1 CGA Ser 2 TGA Ser 1 + ACA Cys GCA Cys 2 CCA Trp 2 TCA seC + ATA Tyr GTA Tyr 1 CTA Pyl TTA Stop + AAG Leu GAG Leu 3 CAG Leu 1 TAG Leu 2 + AGG Pro GGG Pro 2 CGG Pro 2 TGG Pro 2 + ACG Arg 1 GCG Arg 2 CCG Arg 1 TCG Arg + ATG His GTG His 2 CTG Gln 2 TTG Gln 1 + AAC Val GAC Val 3 CAC Val 2 TAC Val 1 + AGC Ala GGC Ala 2 CGC Ala 3 TGC Ala 1 + ACC Gly GCC Gly 5 CCC Gly 1 TCC Gly 2 + ATC Asp GTC Asp 3 CTC Glu 2 TTC Glu 2 + AAT Ile GAT Ile 3 CAT Met 6 TAT Ile + AGT Thr GGT Thr 2 CGT Thr 1 TGT Thr 2 + ACT Ser GCT Ser 1 CCT Arg 1 TCT Arg 1 + ATT Asn GTT Asn 3 CTT Lys 3 TTT Lys 2 + Ambiguous: 1 + + tRNA Codon Frequency + TTT Phe TTC Phe 1 TTG Leu 1 TTA Leu 1 + TCT Ser TCC Ser 1 TCG Ser 2 TCA Ser 1 + TGT Cys TGC Cys 2 TGG Trp 2 TGA seC + TAT Tyr TAC Tyr 1 TAG Pyl TAA Stop + CTT Leu CTC Leu 3 CTG Leu 1 CTA Leu 2 + CCT Pro CCC Pro 2 CCG Pro 2 CCA Pro 2 + CGT Arg 1 CGC Arg 2 CGG Arg 1 CGA Arg + CAT His CAC His 2 CAG Gln 2 CAA Gln 1 + GTT Val GTC Val 3 GTG Val 2 GTA Val 1 + GCT Ala GCC Ala 2 GCG Ala 3 GCA Ala 1 + GGT Gly GGC Gly 5 GGG Gly 1 GGA Gly 2 + GAT Asp GAC Asp 3 GAG Glu 2 GAA Glu 2 + ATT Ile ATC Ile 3 ATG Met 6 ATA Ile + ACT Thr ACC Thr 2 ACG Thr 1 ACA Thr 2 + AGT Ser AGC Ser 1 AGG Arg 1 AGA Arg 1 + AAT Asn AAC Asn 3 AAG Lys 3 AAA Lys 2 + Ambiguous: 1 + + Number of tRNA genes = 86 + tRNA GC range = 50.0% to 85.1% + Number of tmRNA genes = 1 + +------- + +**References** + +Dean Laslett and Bjorn Canback + +ARAGORN, a program to detect tRNA genes and tmRNA genes in nucleotide sequences Nucl. Acids Res. (2004) 32(1): 11-16 + +doi:10.1093/nar/gkh152 + + + diff -r 000000000000 -r d34f31cbc9dd readme.rst --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/readme.rst Sat Jul 06 10:37:13 2013 -0400 @@ -0,0 +1,93 @@ +Galaxy wrapper for t-RNA prediction tools +========================================= + +This wrapper is copyright 2012-2013 by Björn Grüning. + +This prepository contains wrapper for the command line tools of tRNAscan-SE_ and Arogorn_. + +.. _tRNAscan-SE: http://lowelab.ucsc.edu/tRNAscan-SE/ +.. _Arogorn: http://mbio-serv2.mbioekol.lu.se/ARAGORN/ + +Dean Laslett and Bjorn Canback +ARAGORN, a program to detect tRNA genes and tmRNA genes in nucleotide sequences Nucl. Acids Res. (2004) 32(1): 11-16 +doi:10.1093/nar/gkh152 + +Todd M. Lowe and Sean R. Eddy +tRNAscan-SE: A Program for Improved Detection of Transfer RNA Genes in Genomic Sequence Nucl. Acids Res. (1997) 25(5): 0955-964 +doi:10.1093/nar/25.5.0955 + + +============ +Installation +============ + +The t-RNA prediction wrappers are available through the toolshed_ and can be automatically installed. + +.. _toolshed: http://toolshed.g2.bx.psu.edu/view/bjoern-gruening/trna_prediction + +For manuel installation, please download tRNAscan-SE from the following URL and follow the install instructions:: + + http://lowelab.ucsc.edu/software/tRNAscan-SE.tar.gz + +Arogorn can be download from:: + + http://mbio-serv2.mbioekol.lu.se/ARAGORN/aragorn1.2.33.c + +With a recent GNU-Compiler (gcc) you can compile it with the following command:: + + gcc -O3 -ffast-math -finline-functions -o aragorn aragorn1.2.33.c + +Please include aragorn and tRNAscan-SE into your PATH:: + + export PATH=$PATH:/home/user/bin/aragorn/bin/ + + +To install the wrappers copy the files aragorn.xml and tRNAscan.xml in the galaxy tools +folder and modify the tools_conf.xml file to make the tool available to Galaxy. +For example add the following lines:: + + + + + +======= +History +======= + +tRNAscan: + + - v0.1: Initial public release + - v0.2: add fasta output + - v0.2.1: added tool-dependency + - v0.2.2: patch from Nicola Soranzo added + - v0.3: add unit tests, documentation improvements, bug fixes + +aragorn: + + - v0.1: Initial public release + - v0.2: added options, upgrade to 1.2.36, tool-dependency + - v0.3: add unit tests, documentation improvements + + + +Wrapper Licence (MIT/BSD style) +=============================== + +Permission to use, copy, modify, and distribute this software and its +documentation with or without modifications and for any purpose and +without fee is hereby granted, provided that any copyright notices +appear in all copies and that both those copyright notices and this +permission notice appear in supporting documentation, and that the +names of the contributors or copyright holders not be used in +advertising or publicity pertaining to distribution of the software +without specific prior permission. + +THE CONTRIBUTORS AND COPYRIGHT HOLDERS OF THIS SOFTWARE DISCLAIM ALL +WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ALL IMPLIED +WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT SHALL THE +CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY SPECIAL, INDIRECT +OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS +OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE +OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE +OR PERFORMANCE OF THIS SOFTWARE. + diff -r 000000000000 -r d34f31cbc9dd tRNAscan.py --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tRNAscan.py Sat Jul 06 10:37:13 2013 -0400 @@ -0,0 +1,71 @@ +#!/usr/bin/env python + +""" + Converts tRNAScan output back to fasta-sequences. +""" +import sys +from Bio import SeqIO +from Bio.SeqRecord import SeqRecord +import subprocess + + +def main(args): + """ + Call from galaxy: + tRNAscan.py $organism $mode $showPrimSecondOpt $disablePseudo $showCodons $tabular_output $inputfile $fasta_output + + tRNAscan-SE $organism $mode $showPrimSecondOpt $disablePseudo $showCodons -d -Q -y -q -b -o $tabular_output $inputfile; + """ + cmd = """tRNAscan-SE -Q -y -q -b %s""" % ' '.join( args[:-1] ) + child = subprocess.Popen(cmd.split(), + stdout=subprocess.PIPE, stderr=subprocess.PIPE) + stdout, stderr = child.communicate() + return_code = child.returncode + if return_code: + sys.stdout.write(stdout) + sys.stderr.write(stderr) + sys.stderr.write("Return error code %i from command:\n" % return_code) + sys.stderr.write("%s\n" % cmd) + else: + sys.stdout.write(stdout) + sys.stdout.write(stderr) + + outfile = args[-1] + sequence_file = args[-2] + tRNAScan_file = args[-3] + + with open( sequence_file ) as sequences: + sequence_recs = SeqIO.to_dict(SeqIO.parse(sequences, "fasta")) + + tRNAs = [] + with open(tRNAScan_file) as tRNA_handle: + for line in tRNA_handle: + line = line.strip() + if not line or line.startswith('#'): + continue + cols = line.split() + iid = cols[0].strip() + start = int(cols[2]) + end = int(cols[3]) + aa = cols[4] + codon = cols[5] + rec = sequence_recs[ iid ] + if start > end: + new_rec = rec[end:start] + new_rec.seq = new_rec.seq.reverse_complement() + new_rec.description = "%s %s %s %s %s" % (rec.description, aa, codon, start, end) + new_rec.id = rec.id + new_rec.name = rec.name + tRNAs.append( new_rec ) + else: + new_rec = rec[start:end] + new_rec.id = rec.id + new_rec.name = rec.name + new_rec.description = "%s %s %s %s %s" % (rec.description, aa, codon, start, end) + tRNAs.append( new_rec ) + + SeqIO.write(tRNAs, open(outfile, 'w+'), "fasta") + + +if __name__ == '__main__': + main(sys.argv[1:]) diff -r 000000000000 -r d34f31cbc9dd tRNAscan.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tRNAscan.xml Sat Jul 06 10:37:13 2013 -0400 @@ -0,0 +1,234 @@ + + (tRNAscan) + + tRNAscan-SE + biopython + + + tRNAscan.py + $organism + $mode + $showPrimSecondOpt + $disablePseudo + $showCodons + -o + $tabular_output + $inputfile + $fasta_output + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +.. class:: warningmark + +**TIP** This tool requires *fasta* formated sequences. + +----- + +**What it does** + + tRNAscan-SE_ was designed to make rapid, sensitive searches of genomic + sequence feasible using the selectivity of the Cove analysis package. + We have optimized search sensitivity with eukaryote cytoplasmic and + eubacterial sequences, but it may be applied more broadly with a + slight reduction in sensitivity. + +.. _tRNAscan-SE: http://lowelab.ucsc.edu/tRNAscan-SE/ + +----- + +**Organism** + +- search for eukaryotic cytoplasmic tRNAs: + + This is the default. + +- use general tRNA model: + + This option selects the general tRNA covariance model that was trained + on tRNAs from all three phylogenetic domains (Archaea, Bacteria, and + Eukarya). This mode can be used when analyzing a mixed collection of + sequences from more than one phylogenetic domain, with only slight + loss of sensitivity and selectivity. The original publication + describing this program and tRNAscan-SE version 1.0 used this general + tRNA model exclusively. If you wish to compare scores to those found + in the paper or scans using v1.0, use this option. Use of this option + is compatible with all other search mode options described in this + section. + +- search for bacterial tRNAs + + This option selects the bacterial covariance model for tRNA analysis, + and loosens the search parameters for EufindtRNA to improve detection + of bacterial tRNAs. Use of this mode with bacterial sequences + will also improve bounds prediction of the 3' end (the terminal CAA + triplet). + +- search for archaeal tRNAs + + This option selects an archaeal-specific covariance model for tRNA + analysis, as well as slightly loosening the EufindtRNA search + cutoffs. + +- search for organellar (mitochondrial/chloroplast) tRNAs + + This parameter bypasses the fast first-pass scanners that are poor at + detecting organellar tRNAs and runs Cove analysis only. Since true + organellar tRNAs have been found to have Cove scores between 15 and 20 + bits, the search cutoff is lowered from 20 to 15 bits. Also, + pseudogene checking is disabled since it is only applicable to + eukaryotic cytoplasmic tRNA pseudogenes. Since Cove-only mode is + used, searches will be very slow (see -C option below) relative to the + default mode. + +------ + +**Mode** + +- search using Cove analysis only (max sensitivity, slow) + + Directs tRNAscan-SE to analyze sequences using Cove analysis only. + This option allows a slightly more sensitive search than the default + tRNAscan + EufindtRNA -> Cove mode, but is much slower (by approx. 250 + to 3,000 fold). Output format and other program defaults are + otherwise identical to the normal analysis. + +- search using Eukaryotic tRNA finder (EufindtRNA) only: + + This option runs EufindtRNA alone to search for tRNAs. Since Cove is + not being used as a secondary filter to remove false positives, this + run mode defaults to "Normal" parameters which more closely + approximates the sensitivity and selectivity of the original algorithm + describe by Pavesi and colleagues. + +- search using tRNAscan only (defaults to strict search parameters) + + Directs tRNAscan-SE to use only tRNAscan to analyze sequences. This + mode will cause tRNAscan to default to using "strict" parameters + (similar to tRNAscan version 1.3 operation). This mode of operation + is faster (about 3-5 times faster than default mode analysis), but + will result in approximately 0.2 to 0.6 false positive tRNAs per Mbp, + decreased sensitivity, and less reliable prediction of anticodons, + tRNA isotype, and introns. + +- search using Infernal cm analysis only (max sensitivity, very slow) + + +- search using Infernal and new cm models instead of Cove + + +----- + +**disable pseudogene checking** + + Manually disable checking tRNAs for poor primary or secondary + structure scores often indicative of eukaryotic pseudogenes. This + will slightly speed the program and may be necessary for non-eukaryotic + sequences that are flagged as possible pseudogenes but are known to be + functional tRNAs. + +----- + +**Show both primary and secondary structure score components to covariance model bit scores** + + This option displays the breakdown of the two components of the + covariance model bit score. Since tRNA pseudogenes often have one + very low component (good secondary structure but poor primary sequence + similarity to the tRNA model, or vice versa), this information may be + useful in deciding whether a low-scoring tRNA is likely to be a + pseudogene. The heuristic pseudogene detection filter uses this + information to flag possible pseudogenes -- use this option to see why + a hit is marked as a possible pseudogene. The user may wish to + examine score breakdowns from known tRNAs in the organism of interest + to get a frame of reference. + +----- + +**Show codons instead of tRNA anticodons** + + This option causes tRNAscan-SE to output a tRNA's corresponding codon + in place of its anticodon. + +----- + +**Example** + +* input: + + >CELF22B7 C.aenorhabditis elegans (Bristol N2) cosmid F22B7 + GATCCTTGTAGATTTTGAATTTGAAGTTTTTTCTCATTCCAAAACTCTGT + GATCTGAAATAAAATGTCTCAAAAAAATAGAAGAAAACATTGCTTTATAT + TTATCAGTTATGGTTTTCAAAATTTTCTGACATACCGTTTTGCTTCTTTT + TTTCTCATCTTCTTCAAATATCAATTGTGATAATCTGACTCCTAACAATC + GAATTTCTTTTCCTTTTTCTTTTTCCAACAACTCCAGTGAGAACTTTTGA + ATATCTTCAAGTGACTTCACCACATCAGAAGGTGTCAACGATCTTGTGAG + AACATCGAATGAAGATAATTTTAATTTTAGAGTTACAGTTTTTCCTCCGA + ..... + + +* output: + + + ======== ====== ===== ====== ==== ========== ====== ====== ========== ========== + tRNA Bounds Intron Bonds + -------- ------ ---------------- ---- ---------- ---------------- ---------- ---------- + Name # tRNA Begin End tRNA Anti Codon Begin End Cove Score Hit Origin + ======== ====== ===== ====== ==== ========== ====== ====== ========== ========== + CELF22B7 1 12619 12738 Leu CAA 12657 12692 55.12 Bo + CELF22B7 2 19480 19561 Ser AGA 0 0 66.90 Bo + CELF22B7 3 26367 26439 Phe GAA 0 0 73.88 Bo + CELF22B7 4 26992 26920 Phe GAA 0 0 73.88 Bo + CELF22B7 5 23765 23694 Pro CGG 0 0 60.58 Bo + ======== ====== ===== ====== ==== ========== ====== ====== ========== ========== + + +------- + +**References** + +Todd M. Lowe and Sean R. Eddy + +tRNAscan-SE: A Program for Improved Detection of Transfer RNA Genes in Genomic Sequence Nucl. Acids Res. (1997) 25(5): 0955-964 + +doi:10.1093/nar/25.5.0955 + + + + diff -r 000000000000 -r d34f31cbc9dd test-data/aragorn_tansl-table-1_tmRNA_tRNA.fasta --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/test-data/aragorn_tansl-table-1_tmRNA_tRNA.fasta Sat Jul 06 10:37:13 2013 -0400 @@ -0,0 +1,3 @@ +>1-1 tRNA-Ala(tgc) [381,453] +ggggatgtagctcatatggtagagcgctcgctttgcatgcgagaggcaca +gggttcgattccctgcatctcca diff -r 000000000000 -r d34f31cbc9dd test-data/tRNAscan_eukaryotic_infernal.fasta --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/test-data/tRNAscan_eukaryotic_infernal.fasta Sat Jul 06 10:37:13 2013 -0400 @@ -0,0 +1,3 @@ +>gi|240255695:23036500-23037000 Arabidopsis thaliana chromosome 3, complete sequence Ala TGC 381 453 +GGGATGTAGCTCATATGGTAGAGCGCTCGCTTTGCATGCGAGAGGCACAGGGTTCGATTC +CCTGCATCTCCA diff -r 000000000000 -r d34f31cbc9dd test-data/tRNAscan_eukaryotic_infernal.tabular --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/test-data/tRNAscan_eukaryotic_infernal.tabular Sat Jul 06 10:37:13 2013 -0400 @@ -0,0 +1,1 @@ +gi|240255695:23036500-23037000 1 381 453 Ala TGC 0 0 67.36 diff -r 000000000000 -r d34f31cbc9dd test-data/trna_arabidopsis.fasta --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/test-data/trna_arabidopsis.fasta Sat Jul 06 10:37:13 2013 -0400 @@ -0,0 +1,10 @@ +>gi|240255695:23036500-23037000 Arabidopsis thaliana chromosome 3, complete sequence +CGATAGGAGACTCGGTCTTGCAGATACAGCAGCGAGGATCGTCGAGGACGAATCGGAGACGAACGACGCA +TGTGGAGCAAACCTCTCGGTGACCACAAGATCCATAGGCAACCCATTCGAGATTGTCAGCGCAGACGGCA +CAGCTATCATCCATGTCTCTTCCTGAATTTGAGATCGAGATAAGGAAATTGTTTCGAAACGACAACAATA +GAGATTGATGAGCAAGAGAGATCTAGGGTTCTCGAAAGAGGGCTCGCTAAATAAAAGGGCTAGATGAAGA +AGAATATCAAAGGTCTCCTAATCATCAAGGCCAGTCAAACAAATACATAAATAAATTAATCGTTGATACT +ATTTAGTTAAAAAAGTGTTGAGAATCATTCGGGGATGTAGCTCATATGGTAGAGCGCTCGCTTTGCATGC +GAGAGGCACAGGGTTCGATTCCCTGCATCTCCATTTTTATTTTCTTTTTTTTATAACTTTTGGTGAGCTT +AATGGCCCAAT + diff -r 000000000000 -r d34f31cbc9dd tool_dependencies.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tool_dependencies.xml Sat Jul 06 10:37:13 2013 -0400 @@ -0,0 +1,47 @@ + + + + + + + + + http://mbio-serv2.mbioekol.lu.se/ARAGORN/Downloads/aragorn1.2.36.tgz + $INSTALL_DIR/bin/ + gcc -O3 -ffast-math -finline-functions -o aragorn aragorn1.2.36.c + + aragorn + $INSTALL_DIR/bin + + + $INSTALL_DIR/bin + + + + Compiling ARAGORN requires gcc. + + + + + http://lowelab.ucsc.edu/software/tRNAscan-SE.tar.gz + $INSTALL_DIR/bin/ + $INSTALL_DIR/lib/tRNAscan-SE/ + $INSTALL_DIR/man/ + + cd ./tRNAscan-SE-1.3.1 && sed 's%^BINDIR = .*%BINDIR = $INSTALL_DIR/bin/%' Makefile | sed 's%^LIBDIR = .*%LIBDIR = $INSTALL_DIR/lib/tRNAscan-SE/%' | sed 's%^MANDIR = .*%MANDIR = $INSTALL_DIR/man%' > Makefile_new + cd ./tRNAscan-SE-1.3.1 && rm Makefile && mv Makefile_new Makefile + cd ./tRNAscan-SE-1.3.1 && make && make install + + + wget ftp://selab.janelia.org/pub/software/infernal/infernal-1.0.2.tar.gz + tar xfvz infernal-1.0.2.tar.gz + cd infernal-1.0.2 && ./configure --prefix=$INSTALL_DIR && make && make install + + $INSTALL_DIR/bin + $INSTALL_DIR/bin/ + + + + Compiling and running tRNAScan-SE requires gcc a PERL environment. + +