annotate corebio/seq_io/fasta_io.py @ 13:cd6c4bd14718

Uploaded
author davidmurphy
date Fri, 24 Feb 2012 09:26:11 -0500
parents c55bdc2fb9fa
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
1 #!/usr/bin/env python
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
2
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
3 # Copyright (c) 2005 Gavin E. Crooks <gec@threeplusone.com>
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
4 #
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
5 # This software is distributed under the MIT Open Source License.
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
6 # <http://www.opensource.org/licenses/mit-license.html>
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
7 #
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
8 # Permission is hereby granted, free of charge, to any person obtaining a
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
9 # copy of this software and associated documentation files (the "Software"),
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
10 # to deal in the Software without restriction, including without limitation
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
11 # the rights to use, copy, modify, merge, publish, distribute, sublicense,
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
12 # and/or sell copies of the Software, and to permit persons to whom the
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
13 # Software is furnished to do so, subject to the following conditions:
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
14 #
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
15 # The above copyright notice and this permission notice shall be included
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
16 # in all copies or substantial portions of the Software.
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
17 #
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
18 # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
19 # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
20 # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
21 # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
22 # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
23 # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
24 # THE SOFTWARE.
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
25 #
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
26
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
27 """Read and write sequence information in FASTA format.
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
28
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
29 This is a very common format for unannotated biological sequence data,
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
30 accepted by many multiple sequence alignment programs. Each sequence
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
31 consists of a single-line description, followed by lines of sequence data.
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
32 The first character of the description line is a greater-than (">") symbol
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
33 in the first column. The first word of the description is often the name or
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
34 ID of the sequence. Fasta files containing multiple sequences have one
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
35 sequence listed right after another.
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
36
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
37
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
38 Example Fasta File ::
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
39
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
40 >Lamprey GLOBIN V - SEA LAMPREY
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
41 PIVDTGSVA-P------------------LSAAEKTKIRSAWAPVYSTY---ETSGVDILVKFFTSTPAAQEFFPKFKGL
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
42 TT-----ADQLKKSA---DVRWHA-ERIINAVNDAVASMDDTEKMS--MKL-RDLSGKH----AKSFQV-----DPQYFK
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
43 VLAAVI-AD-TVAAGD--AGFEKLMSM------I---CILLR----S-----A-----Y------------
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
44 >Hagfish GLOBIN III - ATLANTIC HAGFISH
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
45 PITDHGQPP-T------------------LSEGDKKAIRESWPQIYKNF---EQNSLAVLLEFLKKFPKAQDSFPKFSAK
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
46 KS-------HLEQDP---AVKLQA-EVIINAVNHTIGLMDKEAAMK--KYL-KDLSTKH----STEFQV-----NPDMFK
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
47 ELSAVF-VS-TMG-GK--AAYEKLFSI------I---ATLLR----S-----T-----YDA----------
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
48 >Frog HEMOGLOBIN BETA CHAIN - EDIBLE FROG
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
49 ----------GS-----------------------DLVSGFWGKV--DA---HKIGGEALARLLVVYPWTQRYFTTFGNL
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
50 GSADAIC-----HNA---KVLAHG-EKVLAAIGEGLKHPENLKAHY--AKL-SEYHSNK----LHVDPANFRLLGNVFIT
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
51 VLARHF-QH-EFTPELQ-HALEAHFCA------V---GDALA----K-----A-----YH-----------
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
52
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
53
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
54 """
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
55 import re
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
56 from corebio.utils import *
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
57 from corebio.seq import *
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
58 from corebio.seq_io import *
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
59
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
60
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
61 names = ( 'fasta', 'pearson', 'fa')
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
62 extensions = ('fa', 'fasta', 'fast', 'seq', 'fsa', 'fst', 'nt', 'aa','fna','mpfa', 'faa', 'fnn','mfasta')
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
63
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
64
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
65 example = """
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
66 >Lamprey GLOBIN V - SEA LAMPREY
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
67 PIVDTGSVA-P------------------LSAAEKTKIRSAWAPVYSTY---ETSGVDILVKFFTSTPAAQEFFPKFKGL
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
68 TT-----ADQLKKSA---DVRWHA-ERIINAVNDAVASMDDTEKMS--MKL-RDLSGKH----AKSFQV-----DPQYFK
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
69 VLAAVI-AD-TVAAGD--AGFEKLMSM------I---CILLR----S-----A-----Y------------
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
70
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
71 >Hagfish GLOBIN III - ATLANTIC HAGFISH
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
72 PITDHGQPP-T------------------LSEGDKKAIRESWPQIYKNF---EQNSLAVLLEFLKKFPKAQDSFPKFSAK
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
73 KS-------HLEQDP---AVKLQA-EVIINAVNHTIGLMDKEAAMK--KYL-KDLSTKH----STEFQV-----NPDMFK
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
74 ELSAVF-VS-TMG-GK--AAYEKLFSI------I---ATLLR----S-----T-----YDA----------
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
75
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
76 >Frog HEMOGLOBIN BETA CHAIN - EDIBLE FROG
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
77 ----------GS-----------------------DLVSGFWGKV--DA---HKIGGEALARLLVVYPWTQRYFTTFGNL
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
78 GSADAIC-----HNA---KVLAHG-EKVLAAIGEGLKHPENLKAHY--AKL-SEYHSNK----LHVDPANFRLLGNVFIT
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
79 VLARHF-QH-EFTPELQ-HALEAHFCA------V---GDALA----K-----A-----YH-----------
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
80
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
81 """
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
82
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
83
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
84 def read(fin, alphabet=None):
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
85 """Read and parse a fasta file.
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
86
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
87 Args:
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
88 fin -- A stream or file to read
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
89 alphabet -- The expected alphabet of the data, if given
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
90 Returns:
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
91 SeqList -- A list of sequences
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
92 Raises:
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
93 ValueError -- If the file is unparsable
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
94 """
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
95 seqs = [ s for s in iterseq(fin, alphabet)]
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
96 name = names[0]
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
97 if hasattr(fin, "name") : name = fin.name
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
98 return SeqList(seqs, name=name)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
99
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
100
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
101 def readseq(fin, alphabet=None) :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
102 """Read one sequence from the file, starting
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
103 from the current file position."""
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
104 return iterseq(fin, alphabet).next()
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
105
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
106
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
107 def iterseq(fin, alphabet=None):
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
108 """ Parse a fasta file and generate sequences.
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
109
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
110 Args:
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
111 fin -- A stream or file to read
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
112 alphabet -- The expected alphabet of the data, if given
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
113 Yeilds:
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
114 Seq -- One alphabetic sequence at a time.
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
115 Raises:
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
116 ValueError -- If the file is unparsable
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
117 """
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
118 alphabet = Alphabet(alphabet)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
119
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
120 seqs = []
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
121 comments = [] # FIXME: comments before first sequence are lost.
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
122 header = None
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
123 header_lineno = -1
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
124
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
125 def build_seq(seqs,alphabet, header, header_lineno,comments) :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
126 try :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
127 name = header.split(' ',1)[0]
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
128 if comments :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
129 header += '\n' + '\n'.join(comments)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
130 s = Seq( "".join(seqs), alphabet, name=name, description=header)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
131 except ValueError:
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
132 raise ValueError(
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
133 "Parsed failed with sequence starting at line %d: "
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
134 "Character not in alphabet: %s" % (header_lineno, alphabet) )
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
135 return s
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
136
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
137 for lineno, line in enumerate(fin) :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
138 line = line.strip()
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
139 if line == '' : continue
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
140 if line.startswith('>') :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
141 if header is not None :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
142 yield build_seq(seqs,alphabet, header, header_lineno, comments)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
143 header = None
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
144 seqs = []
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
145 header = line[1:]
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
146 header_lineno = lineno
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
147 comments = []
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
148 elif line.startswith(';') :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
149 # Optional (and unusual) comment line
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
150 comments.append(line[1:])
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
151 else :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
152 if header is None :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
153 raise ValueError (
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
154 "Parse failed on line %d: sequence before header"
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
155 % (lineno) )
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
156 seqs.append(line)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
157
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
158 if not seqs: return
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
159 yield build_seq(seqs,alphabet, header, header_lineno, comments)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
160
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
161
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
162 def write(fout, seqs):
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
163 """Write a fasta file.
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
164
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
165 Args:
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
166 fout -- A writable stream.
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
167 seqs -- A list of Seq's
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
168 """
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
169 if seqs.description :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
170 for line in seqs.description.splitlines():
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
171 print >>fout, ';'+ line
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
172 for s in seqs :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
173 writeseq(fout, s)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
174
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
175
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
176 def writeseq(afile, seq):
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
177 """ Write a single sequence in fasta format.
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
178
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
179 Args:
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
180 afile -- A writable stream.
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
181 seq -- A Seq instance
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
182 """
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
183
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
184 header = seq.description or seq.name or ''
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
185
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
186 # We prepend '>' to the first header line
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
187 # Additional lines start with ';' to indicate comment lines
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
188 if header :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
189 header = header.splitlines()
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
190 print >>afile, '>'+header[0]
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
191 if len(header) > 1 :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
192 for h in header[1:] :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
193 print >>afile, ';' +h
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
194 else :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
195 print >>afile, '>'
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
196
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
197 L = len(seq)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
198 line_length = 80
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
199 for n in range (1+ L/line_length) :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
200 print >>afile, seq[n * line_length: (n+1) * line_length]
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
201 print >>afile
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
202
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
203
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
204 def index(afile, alphabet=None) :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
205 """Return a FileIndex for the fasta file. Sequences can be retrieved
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
206 by item number or name.
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
207 """
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
208 def parser( afile) :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
209 return readseq(afile, alphabet)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
210
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
211 key = re.compile(r"^>\s*(\S*)")
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
212 def linekey( line):
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
213 k = key.search(line)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
214 if k is None : return None
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
215 return k.group(1)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
216
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
217 return FileIndex(afile, linekey, parser)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
218
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
219
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
220
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
221
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
222
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
223
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
224
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
225
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
226
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
227
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
228
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
229
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
230