0
|
1 <tool id="align_back_trans" name="Thread nucleotides onto a protein alignment (back-translation)" version="0.0.3">
|
|
2 <description>Gives a codon aware alignment</description>
|
|
3 <requirements>
|
|
4 <requirement type="package" version="1.63">biopython</requirement>
|
|
5 <requirement type="python-module">Bio</requirement>
|
|
6 </requirements>
|
|
7 <version_command interpreter="python">align_back_trans.py --version</version_command>
|
|
8 <command interpreter="python">
|
|
9 align_back_trans.py $prot_align.ext "$prot_align" "$nuc_file" "$out_nuc_align" "$table"
|
|
10 </command>
|
|
11 <stdio>
|
|
12 <!-- Anything other than zero is an error -->
|
|
13 <exit_code range="1:" />
|
|
14 <exit_code range=":-1" />
|
|
15 </stdio>
|
|
16 <inputs>
|
|
17 <param name="prot_align" type="data" format="fasta,muscle,clustal" label="Aligned protein file" help="Mutliple sequence file in FASTA, ClustalW or PHYLIP format." />
|
|
18 <param name="table" type="select" label="Genetic code" help="Tables from the NCBI, these determine the start and stop codons">
|
|
19 <option value="1">1. Standard</option>
|
|
20 <option value="2">2. Vertebrate Mitochondrial</option>
|
|
21 <option value="3">3. Yeast Mitochondrial</option>
|
|
22 <option value="4">4. Mold, Protozoan, Coelenterate Mitochondrial and Mycoplasma/Spiroplasma</option>
|
|
23 <option value="5">5. Invertebrate Mitochondrial</option>
|
|
24 <option value="6">6. Ciliate Macronuclear and Dasycladacean</option>
|
|
25 <option value="9">9. Echinoderm Mitochondrial</option>
|
|
26 <option value="10">10. Euplotid Nuclear</option>
|
|
27 <option value="11">11. Bacterial</option>
|
|
28 <option value="12">12. Alternative Yeast Nuclear</option>
|
|
29 <option value="13">13. Ascidian Mitochondrial</option>
|
|
30 <option value="14">14. Flatworm Mitochondrial</option>
|
|
31 <option value="15">15. Blepharisma Macronuclear</option>
|
|
32 <option value="16">16. Chlorophycean Mitochondrial</option>
|
|
33 <option value="21">21. Trematode Mitochondrial</option>
|
|
34 <option value="22">22. Scenedesmus obliquus</option>
|
|
35 <option value="23">23. Thraustochytrium Mitochondrial</option>
|
|
36 <option value="0">Don't check the translation</option>
|
|
37 </param>
|
|
38 <param name="nuc_file" type="data" format="fasta" label="Unaligned nucleotide sequences" help="FASTA format, using same identifiers as your protein alignment" />
|
|
39 </inputs>
|
|
40 <outputs>
|
|
41 <data name="out_nuc_align" format="fasta" label="${prot_align.name} (back-translated)">
|
|
42 <!-- TODO - Replace this with format="input:prot_align" if/when that works -->
|
|
43 <change_format>
|
|
44 <when input_dataset="prot_align" attribute="extension" value="clustal" format="clustal" />
|
|
45 <when input_dataset="prot_align" attribute="extension" value="phylip" format="phylip" />
|
|
46 </change_format>
|
|
47 </data>
|
|
48 </outputs>
|
|
49 <tests>
|
|
50 <test>
|
|
51 <param name="prot_align" value="demo_prot_align.fasta" />
|
|
52 <param name="nuc_file" value="demo_nucs.fasta" />
|
|
53 <param name="table" value="0" />
|
|
54 <output name="out_nuc_align" file="demo_nuc_align.fasta" />
|
|
55 </test>
|
|
56 <test>
|
|
57 <param name="prot_align" value="demo_prot_align.fasta" />
|
|
58 <param name="nuc_file" value="demo_nucs_trailing_stop.fasta" />
|
|
59 <param name="table" value="11" />
|
|
60 <output name="out_nuc_align" file="demo_nuc_align.fasta" />
|
|
61 </test>
|
|
62 </tests>
|
|
63 <help>
|
|
64 **What it does**
|
|
65
|
|
66 Takes an input file of aligned protein sequences (typically FASTA or Clustal
|
|
67 format), and a matching file of unaligned nucleotide sequences (FASTA format,
|
|
68 using the same identifiers), and threads the nucleotide sequences onto the
|
|
69 protein alignment to produce a codon aware nucleotide alignment - which can
|
|
70 be viewed as a back translation.
|
|
71
|
|
72 If you specify one of the standard NCBI genetic codes (recommended), then the
|
|
73 translation is verified. This will allow fuzzy matching if stop codons in the
|
|
74 protein sequence have been reprented as X, and will allow for a trailing stop
|
|
75 codon present in the nucleotide sequences but not the protein.
|
|
76
|
|
77 Note - the protein and nucleotide sequences must use the same identifers.
|
|
78
|
|
79 Note - If no translation table is specified, the provided nucleotide sequences
|
|
80 should be exactly three times the length of the protein sequences (exluding the gaps).
|
|
81
|
|
82 Note - the nucleotide FASTA file may contain extra sequences not in the
|
|
83 protein alignment, they will be ignored. This can be useful if for example
|
|
84 you have a nucleotide FASTA file containing all the genes in an organism,
|
|
85 while the protein alignment is for a specific gene family.
|
|
86
|
|
87 **Example**
|
|
88
|
|
89 Given this protein alignment in FASTA format::
|
|
90
|
|
91 >Alpha
|
|
92 DEER
|
|
93 >Beta
|
|
94 DE-R
|
|
95 >Gamma
|
|
96 D--R
|
|
97
|
|
98 and this matching unaligned nucleotide FASTA file::
|
|
99
|
|
100 >Alpha
|
|
101 GATGAGGAACGA
|
|
102 >Beta
|
|
103 GATGAGCGU
|
|
104 >Gamma
|
|
105 GATCGG
|
|
106
|
|
107 the tool would return this nucleotide alignment::
|
|
108
|
|
109 >Alpha
|
|
110 GATGAGGAACGA
|
|
111 >Beta
|
|
112 GATGAG---CGU
|
|
113 >Gamma
|
|
114 GAT------CGG
|
|
115
|
|
116 Notice that all the gaps are multiples of three in length.
|
|
117
|
|
118
|
|
119 **Citation**
|
|
120
|
|
121 This tool uses Biopython, so if you use this Galaxy tool in work leading to a
|
|
122 scientific publication please cite the following paper:
|
|
123
|
|
124 Cock et al (2009). Biopython: freely available Python tools for computational
|
|
125 molecular biology and bioinformatics. Bioinformatics 25(11) 1422-3.
|
|
126 http://dx.doi.org/10.1093/bioinformatics/btp163 pmid:19304878.
|
|
127
|
|
128 This tool is available to install into other Galaxy Instances via the Galaxy
|
|
129 Tool Shed at http://toolshed.g2.bx.psu.edu/view/peterjc/align_back_trans
|
|
130 </help>
|
|
131 </tool>
|