annotate tools/align_back_trans/align_back_trans.py @ 1:ec202446408a draft

Uploaded v0.0.4, fixed an error message.
author peterjc
date Wed, 04 Jun 2014 08:42:23 -0400
parents 0c24e4e2177d
children 9fbf29a8c12b
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
1 #!/usr/bin/env python
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
2 """Back-translate a protein alignment to nucleotides
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
3
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
4 This tool is a short Python script (using Biopython library functions) to
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
5 load a protein alignment, and matching nucleotide FASTA file of unaligned
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
6 sequences, in order to produce a codon aware nucleotide alignment - which
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
7 can be viewed as a back translation.
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
8
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
9 The development repository for this tool is here:
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
10
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
11 * https://github.com/peterjc/pico_galaxy/tree/master/tools/align_back_trans
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
12
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
13 This tool is available with a Galaxy wrapper from the Galaxy Tool Shed at:
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
14
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
15 * http://toolshed.g2.bx.psu.edu/view/peterjc/align_back_trans
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
16
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
17 See accompanying text file for licence details (MIT licence).
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
18
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
19 This is version 0.0.3 of the script.
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
20 """
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
21
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
22 import sys
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
23 from Bio.Seq import Seq
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
24 from Bio.Alphabet import generic_dna, generic_protein
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
25 from Bio.Align import MultipleSeqAlignment
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
26 from Bio import SeqIO
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
27 from Bio import AlignIO
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
28 from Bio.Data.CodonTable import ambiguous_generic_by_id
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
29
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
30 if "-v" in sys.argv or "--version" in sys.argv:
1
ec202446408a Uploaded v0.0.4, fixed an error message.
peterjc
parents: 0
diff changeset
31 print "v0.0.4"
0
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
32 sys.exit(0)
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
33
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
34 def stop_err(msg, error_level=1):
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
35 """Print error message to stdout and quit with given error level."""
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
36 sys.stderr.write("%s\n" % msg)
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
37 sys.exit(error_level)
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
38
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
39 def check_trans(identifier, nuc, prot, table):
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
40 """Returns nucleotide sequence if works (can remove trailing stop)"""
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
41 if len(nuc) % 3:
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
42 stop_err("Nucleotide sequence for %s is length %i (not a multiple of three)"
1
ec202446408a Uploaded v0.0.4, fixed an error message.
peterjc
parents: 0
diff changeset
43 % (identifier, len(nuc)))
0
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
44
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
45 p = str(prot).upper().replace("*", "X")
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
46 t = str(nuc.translate(table)).upper().replace("*", "X")
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
47 if len(t) == len(p) + 1:
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
48 if str(nuc)[-3:].upper() in ambiguous_generic_by_id[table].stop_codons:
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
49 #Allow this...
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
50 t = t[:-1]
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
51 nuc = nuc[:-3] #edit return value
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
52 if len(t) != len(p) and p in t:
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
53 stop_err("%s translation matched but only as subset of nucleotides, "
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
54 "wrong start codon?" % identifier)
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
55 if len(t) != len(p) and p[1:] in t:
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
56 stop_err("%s translation matched (ignoring first base) but only "
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
57 "as subset of nucleotides, wrong start codon?" % identifier)
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
58 if len(t) != len(p):
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
59 stop_err("Inconsistent lengths for %s, ungapped protein %i, "
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
60 "tripled %i vs ungapped nucleotide %i" %
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
61 (identifier,
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
62 len(p),
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
63 len(p) * 3,
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
64 len(nuc)))
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
65
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
66
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
67 if t == p:
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
68 return nuc
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
69 elif p.startswith("M") and "M" + t[1:] == p:
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
70 #Close, was there a start codon?
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
71 if str(nuc[0:3]).upper() in ambiguous_generic_by_id[table].start_codons:
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
72 return nuc
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
73 else:
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
74 stop_err("Translation check failed for %s\n"
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
75 "Would match if %s was a start codon (check correct table used)\n"
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
76 % (identifier, nuc[0:3].upper()))
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
77 else:
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
78 #Allow * vs X here? e.g. internal stop codons
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
79 m = "".join("." if x==y else "!" for (x,y) in zip(p,t))
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
80 if len(prot) < 70:
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
81 sys.stderr.write("Protein: %s\n" % p)
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
82 sys.stderr.write(" %s\n" % m)
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
83 sys.stderr.write("Translation: %s\n" % t)
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
84 else:
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
85 for offset in range(0, len(p), 60):
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
86 sys.stderr.write("Protein: %s\n" % p[offset:offset+60])
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
87 sys.stderr.write(" %s\n" % m[offset:offset+60])
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
88 sys.stderr.write("Translation: %s\n\n" % t[offset:offset+60])
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
89 stop_err("Translation check failed for %s\n" % identifier)
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
90
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
91 def sequence_back_translate(aligned_protein_record, unaligned_nucleotide_record, gap, table=0):
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
92 #TODO - Separate arguments for protein gap and nucleotide gap?
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
93 if not gap or len(gap) != 1:
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
94 raise ValueError("Please supply a single gap character")
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
95
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
96 alpha = unaligned_nucleotide_record.seq.alphabet
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
97 if hasattr(alpha, "gap_char"):
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
98 gap_codon = alpha.gap_char * 3
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
99 assert len(gap_codon) == 3
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
100 else:
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
101 from Bio.Alphabet import Gapped
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
102 alpha = Gapped(alpha, gap)
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
103 gap_codon = gap*3
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
104
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
105 ungapped_protein = aligned_protein_record.seq.ungap(gap)
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
106 ungapped_nucleotide = unaligned_nucleotide_record.seq
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
107 if table:
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
108 ungapped_nucleotide = check_trans(aligned_protein_record.id, ungapped_nucleotide, ungapped_protein, table)
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
109 elif len(ungapped_protein) * 3 != len(ungapped_nucleotide):
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
110 stop_err("Inconsistent lengths for %s, ungapped protein %i, "
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
111 "tripled %i vs ungapped nucleotide %i" %
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
112 (aligned_protein_record.id,
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
113 len(ungapped_protein),
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
114 len(ungapped_protein) * 3,
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
115 len(ungapped_nucleotide)))
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
116
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
117 seq = []
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
118 nuc = str(ungapped_nucleotide)
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
119 for amino_acid in aligned_protein_record.seq:
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
120 if amino_acid == gap:
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
121 seq.append(gap_codon)
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
122 else:
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
123 seq.append(nuc[:3])
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
124 nuc = nuc[3:]
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
125 assert not nuc, "Nucleotide sequence for %r longer than protein %r" \
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
126 % (unaligned_nucleotide_record.id, aligned_protein_record.id)
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
127
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
128 aligned_nuc = unaligned_nucleotide_record[:] #copy for most annotation
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
129 aligned_nuc.letter_annotation = {} #clear this
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
130 aligned_nuc.seq = Seq("".join(seq), alpha) #replace this
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
131 assert len(aligned_protein_record.seq) * 3 == len(aligned_nuc)
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
132 return aligned_nuc
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
133
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
134 def alignment_back_translate(protein_alignment, nucleotide_records, key_function=None, gap=None, table=0):
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
135 """Thread nucleotide sequences onto a protein alignment."""
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
136 #TODO - Separate arguments for protein and nucleotide gap characters?
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
137 if key_function is None:
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
138 key_function = lambda x: x
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
139 if gap is None:
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
140 gap = "-"
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
141
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
142 aligned = []
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
143 for protein in protein_alignment:
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
144 try:
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
145 nucleotide = nucleotide_records[key_function(protein.id)]
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
146 except KeyError:
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
147 raise ValueError("Could not find nucleotide sequence for protein %r" \
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
148 % protein.id)
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
149 aligned.append(sequence_back_translate(protein, nucleotide, gap, table))
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
150 return MultipleSeqAlignment(aligned)
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
151
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
152
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
153 if len(sys.argv) == 4:
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
154 align_format, prot_align_file, nuc_fasta_file = sys.argv[1:]
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
155 nuc_align_file = sys.stdout
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
156 table = 0
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
157 elif len(sys.argv) == 5:
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
158 align_format, prot_align_file, nuc_fasta_file, nuc_align_file = sys.argv[1:]
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
159 table = 0
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
160 elif len(sys.argv) == 6:
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
161 align_format, prot_align_file, nuc_fasta_file, nuc_align_file, table = sys.argv[1:]
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
162 else:
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
163 stop_err("""This is a Python script for 'back-translating' a protein alignment,
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
164
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
165 It requires three, four or five arguments:
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
166 - alignment format (e.g. fasta, clustal),
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
167 - aligned protein file (in specified format),
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
168 - unaligned nucleotide file (in fasta format).
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
169 - aligned nucleotiode output file (in same format), optional.
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
170 - NCBI translation table (0 for none), optional
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
171
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
172 The nucleotide alignment is printed to stdout if no output filename is given.
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
173
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
174 Example usage:
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
175
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
176 $ python align_back_trans.py fasta demo_prot_align.fasta demo_nucs.fasta demo_nuc_align.fasta
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
177
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
178 Warning: If the output file already exists, it will be overwritten.
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
179
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
180 This script is available with sample data and a Galaxy wrapper here:
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
181 https://github.com/peterjc/pico_galaxy/tree/master/tools/align_back_trans
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
182 http://toolshed.g2.bx.psu.edu/view/peterjc/align_back_trans
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
183 """)
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
184
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
185 try:
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
186 table = int(table)
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
187 except:
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
188 stop_err("Bad table argument %r" % table)
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
189
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
190 prot_align = AlignIO.read(prot_align_file, align_format, alphabet=generic_protein)
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
191 nuc_dict = SeqIO.index(nuc_fasta_file, "fasta")
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
192 nuc_align = alignment_back_translate(prot_align, nuc_dict, gap="-", table=table)
0c24e4e2177d Uploaded v0.0.3, first stable release.
peterjc
parents:
diff changeset
193 AlignIO.write(nuc_align, nuc_align_file, align_format)