annotate corebio/seq.py @ 12:b819394a2634

Uploaded
author davidmurphy
date Wed, 22 Feb 2012 06:42:17 -0500
parents c55bdc2fb9fa
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
1
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
2 # Copyright (c) 2005 Gavin E. Crooks <gec@compbio.berkeley.edu>
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
3 #
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
4 # This software is distributed under the MIT Open Source License.
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
5 # <http://www.opensource.org/licenses/mit-license.html>
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
6 #
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
7 # Permission is hereby granted, free of charge, to any person obtaining a
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
8 # copy of this software and associated documentation files (the "Software"),
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
9 # to deal in the Software without restriction, including without limitation
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
10 # the rights to use, copy, modify, merge, publish, distribute, sublicense,
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
11 # and/or sell copies of the Software, and to permit persons to whom the
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
12 # Software is furnished to do so, subject to the following conditions:
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
13 #
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
14 # The above copyright notice and this permission notice shall be included
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
15 # in all copies or substantial portions of the Software.
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
16 #
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
17 # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
18 # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
19 # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
20 # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
21 # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
22 # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
23 # THE SOFTWARE.
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
24 #
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
25
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
26
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
27
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
28 """ Alphabetic sequences and associated tools and data.
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
29
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
30 Seq is a subclass of a python string with additional annotation and an alphabet.
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
31 The characters in string must be contained in the alphabet. Various standard
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
32 alphabets are provided.
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
33
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
34
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
35 Classes :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
36 Alphabet -- A subset of non-null ascii characters
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
37 Seq -- An alphabetic string
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
38 SeqList -- A collection of Seq's
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
39
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
40 Alphabets :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
41 o generic_alphabet -- A generic alphabet. Any printable ASCII character.
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
42 o protein_alphabet -- IUCAP/IUB Amino Acid one letter codes.
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
43 o nucleic_alphabet -- IUPAC/IUB Nucleic Acid codes 'ACGTURYSWKMBDHVN-'
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
44 o dna_alphabet -- Same as nucleic_alphabet, with 'U' (Uracil) an
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
45 alternative for 'T' (Thymidine).
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
46 o rna_alphabet -- Same as nucleic_alphabet, with 'T' (Thymidine) an
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
47 alternative for 'U' (Uracil).
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
48 o reduced_nucleic_alphabet -- All ambiguous codes in 'nucleic_alphabet' are
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
49 alternative to 'N' (aNy)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
50 o reduced_protein_alphabet -- All ambiguous ('BZJ') and non-canonical amino
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
51 acids codes ( 'U', Selenocysteine and 'O', Pyrrolysine) in
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
52 'protein_alphabet' are alternative to 'X'.
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
53 o unambiguous_dna_alphabet -- 'ACGT'
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
54 o unambiguous_rna_alphabet -- 'ACGU'
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
55 o unambiguous_protein_alphabet -- The twenty canonical amino acid one letter
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
56 codes, in alphabetic order, 'ACDEFGHIKLMNPQRSTVWY'
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
57
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
58 Amino Acid Codes:
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
59 Code Alt. Meaning
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
60 -----------------
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
61 A Alanine
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
62 B Aspartic acid or Asparagine
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
63 C Cysteine
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
64 D Aspartate
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
65 E Glutamate
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
66 F Phenylalanine
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
67 G Glycine
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
68 H Histidine
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
69 I Isoleucine
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
70 J Leucine or Isoleucine
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
71 K Lysine
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
72 L Leucine
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
73 M Methionine
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
74 N Asparagine
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
75 O Pyrrolysine
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
76 P Proline
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
77 Q Glutamine
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
78 R Arginine
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
79 S Serine
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
80 T Threonine
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
81 U Selenocysteine
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
82 V Valine
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
83 W Tryptophan
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
84 Y Tyrosine
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
85 Z Glutamate or Glutamine
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
86 X ? any
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
87 * translation stop
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
88 - .~ gap
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
89
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
90 Nucleotide Codes:
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
91 Code Alt. Meaning
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
92 ------------------------------
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
93 A Adenosine
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
94 C Cytidine
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
95 G Guanine
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
96 T Thymidine
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
97 U Uracil
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
98 R G A (puRine)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
99 Y T C (pYrimidine)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
100 K G T (Ketone)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
101 M A C (aMino group)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
102 S G C (Strong interaction)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
103 W A T (Weak interaction)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
104 B G T C (not A) (B comes after A)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
105 D G A T (not C) (D comes after C)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
106 H A C T (not G) (H comes after G)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
107 V G C A (not T, not U) (V comes after U)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
108 N X? A G C T (aNy)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
109 - .~ A gap
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
110
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
111
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
112
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
113
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
114 Refs:
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
115 http://www.chem.qmw.ac.uk/iupac/AminoAcid/A2021.html
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
116 http://www.chem.qmw.ac.uk/iubmb/misc/naseq.html
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
117 Status:
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
118 Beta
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
119 Authors:
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
120 GEC 2004,2005
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
121 """
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
122
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
123 # TODO: Add this to docstring somewhere.
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
124 # To replace all ambiguous nucleic code by 'N', replace alphabet and then n
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
125 # normalize.
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
126 #
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
127 # >>> Seq( 'ACGT-RYKM', reduced_nucleic_alphabet).normalized()
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
128 # 'ACGT-NNNN'
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
129
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
130 from array import array
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
131 from string import maketrans
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
132 from corebio.moremath import argmax, sqrt
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
133
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
134 __all__ = [
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
135 'Alphabet',
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
136 'Seq',
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
137 'rna', 'dna', 'protein',
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
138 'SeqList',
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
139 'generic_alphabet',
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
140 'protein_alphabet',
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
141 'nucleic_alphabet',
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
142 'dna_alphabet',
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
143 'rna_alphabet',
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
144 'reduced_nucleic_alphabet',
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
145 'reduced_protein_alphabet',
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
146 'unambiguous_dna_alphabet',
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
147 'unambiguous_dna_alphabet',
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
148 'unambiguous_rna_alphabet',
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
149 'unambiguous_protein_alphabet',
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
150 'generic_alphabet'
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
151 ]
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
152
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
153
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
154
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
155 class Alphabet(object) :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
156 """An ordered subset of printable ascii characters.
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
157
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
158 Status:
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
159 Beta
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
160 Authors:
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
161 - GEC 2005
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
162 """
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
163 __slots__ = ['_letters', '_alternatives','_ord_table', '_chr_table']
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
164
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
165 # We're immutable, so use __new__ not __init__
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
166 def __new__(cls, letters, alternatives= None) :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
167 """Create a new, immutable Alphabet.
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
168
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
169 arguments:
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
170 - letters -- the letters in the alphabet. The ordering determines
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
171 the ordinal position of each character in this alphabet.
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
172 - alt -- A list of (alternative, canonical) letters. The alternatives
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
173 are given the same ordinal position as the canonical character.
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
174 e.g. (('?','X'),('x', 'X')) states that '?' and 'x' are synomonous
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
175 with 'X'. Values that are not in 'letters' are ignored. Alternatives
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
176 that are already in 'letters' are also ignored. If the same
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
177 alternative character is used twice then the alternative is assigned
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
178 to the canonical character that occurs first in 'letters'. The
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
179 default is to assume that upper and lower case characters are
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
180 equivalent, unless both cases are included in 'letters'.
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
181 raises:
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
182 ValueError : Repetitive or otherwise illegal set of letters.
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
183 """
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
184 self = object.__new__(cls)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
185
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
186 # Printable Ascii characters
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
187 ascii_letters = "".join([chr(__i) for __i in range(32,128)])
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
188
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
189 if letters is None : letters = ascii_letters
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
190 self._letters = letters
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
191
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
192 equivalent_by_case = zip( 'abcdefghijklmnopqrstuvwxyz',
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
193 'ABCDEFGHIJKLMNOPQRSTUVWXYZ')
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
194
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
195 if alternatives is None : alternatives = equivalent_by_case
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
196
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
197
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
198 # The ord_table maps between the ordinal position of a character in ascii
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
199 # and the ordinal position in this alphabet. Characters not in the
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
200 # alphabet are given a position of 255. The ord_table is stored as a
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
201 # string.
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
202 ord_table = ["\xff",] * 256
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
203 for i,a in enumerate(letters) :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
204 n = ord(a)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
205 if n == 0 :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
206 raise ValueError("Alphabet cannot contain null character \\0")
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
207 if ord_table[ n ] != "\xff":
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
208 raise ValueError("Repetitive alphabet")
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
209 ord_table[ n ] = chr(i)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
210
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
211 # Add alternatives
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
212 _from = []
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
213 _to = []
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
214 for e, c in alternatives :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
215 if c in letters :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
216 n = ord(e)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
217 if ord_table[ n ] == "\xff" : # empty
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
218 ord_table[ n ] = ord_table[ ord(c)]
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
219 _from.append(e)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
220 _to.append(c)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
221 self._alternatives = (''.join(_from), ''.join(_to))
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
222
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
223 ord_table = "".join(ord_table)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
224 assert( ord_table[0] == "\xff")
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
225 self._ord_table = ord_table
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
226
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
227 # The chr_table maps between ordinal position in the alphabet letters
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
228 # and the ordinal position in ascii. This map is not the inverse of
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
229 # ord_table if there are alternatives.
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
230 chr_table = ["\x00"]*256
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
231 for i,a in enumerate(letters) :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
232 chr_table[ i ] = a
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
233 chr_table = "".join(chr_table)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
234 self._chr_table = chr_table
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
235
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
236 return self
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
237
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
238
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
239 def alphabetic(self, string) :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
240 """True if all characters of the string are in this alphabet."""
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
241 table = self._ord_table
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
242 for s in str(string):
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
243 if table[ord(s)] == "\xff" :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
244 return False
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
245 return True
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
246
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
247 def chr(self, n) :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
248 """ The n'th character in the alphabet (zero indexed) or \\0 """
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
249 return self._chr_table[n]
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
250
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
251 def ord(self, c) :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
252 """The ordinal position of the character c in this alphabet,
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
253 or 255 if no such character.
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
254 """
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
255 return ord(self._ord_table[ord(c)])
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
256
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
257 def chrs(self, sequence_of_ints) :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
258 """Convert a sequence of ordinals into an alphabetic string."""
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
259 if not isinstance(sequence_of_ints, array) :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
260 sequence_of_ints = array('B', sequence_of_ints)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
261 s = sequence_of_ints.tostring().translate(self._chr_table)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
262 return Seq(s, self)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
263
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
264 def ords(self, string) :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
265 """Convert an alphabetic string into a byte array of ordinals."""
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
266 string = str(string)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
267 s = string.translate(self._ord_table)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
268 a = array('B',s)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
269 return a
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
270
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
271
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
272 def normalize(self, string) :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
273 """Normalize an alphabetic string by converting all alternative symbols
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
274 to the canonical equivalent in 'letters'.
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
275 """
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
276 if not self.alphabetic(string) :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
277 raise ValueError("Not an alphabetic string.")
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
278 return self.chrs(self.ords(string))
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
279
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
280 def letters(self) :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
281 """ Letters of the alphabet as a string."""
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
282 return str(self)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
283
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
284 def _all_letters(self) :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
285 """ All allowed letters, including alternatives."""
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
286 let = []
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
287 let.append(self._letters)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
288 for key, value in self._alternatives :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
289 let.append(value)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
290 return ''.join(let)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
291
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
292 def __repr__(self) :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
293 return "Alphabet( '" + self._letters +"', zip"+ repr(self._alternatives)+" )"
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
294
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
295 def __str__(self) :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
296 return str(self._letters)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
297
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
298 def __len__(self) :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
299 return len(self._letters)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
300
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
301 def __eq__(self, other) :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
302 if not hasattr(other, "_ord_table") : return False
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
303 return self._ord_table == other._ord_table
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
304
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
305 def __ne__(self, other) :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
306 return not self.__eq__(other)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
307
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
308 def __iter__(self) :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
309 return iter(self._letters)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
310
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
311 def __getitem__(self, key) :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
312 return self._letters[key]
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
313
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
314
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
315 # End class Alphabet
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
316
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
317 # ------------------- Standard ALPHABETS -------------------
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
318 # Standard alphabets are defined here, after Alphabet class.
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
319
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
320 generic_alphabet = Alphabet(None, None)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
321
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
322
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
323 protein_alphabet = Alphabet('ACDEFGHIKLMNOPQRSTUVWYBJZX*-',
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
324 zip('acdefghiklmnopqrstuvwybjzx?.~',
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
325 'ACDEFGHIKLMNOPQRSTUVWYBJZXX--') )
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
326
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
327
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
328 nucleic_alphabet = Alphabet("ACGTURYSWKMBDHVN-",
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
329 zip("acgturyswkmbdhvnXx?.~",
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
330 "ACGTURYSWKMBDHVNNNN--") )
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
331
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
332 dna_alphabet = Alphabet("ACGTRYSWKMBDHVN-",
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
333 zip('acgtryswkmbdhvnXx?.~Uu',
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
334 'ACGTRYSWKMBDHVNNNN--TT') )
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
335
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
336 rna_alphabet = Alphabet("ACGURYSWKMBDHVN-",
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
337 zip('acguryswkmbdhvnXx?.~Tt',
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
338 'ACGURYSWKMBDHVNNNN--UU') )
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
339
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
340 reduced_nucleic_alphabet = Alphabet("ACGTN-",
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
341 zip('acgtryswkmbdhvnXx?.~TtRYSWKMBDHV',
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
342 'ACGTNNNNNNNNNNNNNN--TTNNNNNNNNNN') )
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
343
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
344 reduced_protein_alphabet = Alphabet('ACDEFGHIKLMNPQRSTVWYX*-',
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
345 zip('acdefghiklmnpqrstvwyx?.~BbZzUu',
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
346 'ACDEFGHIKLMNPQRSTVWYXX--XXXXCC') )
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
347
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
348 unambiguous_dna_alphabet = Alphabet("ACGT", zip('acgt','ACGT') )
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
349
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
350 unambiguous_rna_alphabet = Alphabet("ACGU", zip('acgu','ACGU') )
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
351
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
352 unambiguous_protein_alphabet = Alphabet("ACDEFGHIKLMNPQRSTVWY",
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
353 zip('acdefghiklmnopqrstuvwy',
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
354 'ACDEFGHIKLMNOPQRSTUVWY') )
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
355
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
356
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
357 _complement_table = maketrans("ACGTRYSWKMBDHVN-acgtUuryswkmbdhvnXx?.~",
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
358 "TGCAYRSWMKVHDBN-tgcaAayrswmkvhdbnXx?.~")
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
359
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
360
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
361
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
362 class Seq(str):
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
363 """ An alphabetic string. A subclass of "str" consisting solely of
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
364 letters from the same alphabet.
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
365
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
366 Attributes:
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
367 alphabet -- A string or Alphabet of allowed characters.
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
368 name -- A short string used to identify the sequence.
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
369 description -- A string describing the sequence
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
370
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
371 Authors :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
372 GEC 2005
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
373 """
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
374 # TODO: need a method to return a copy of the string with a new alphabet,
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
375 # preserving the sequence, name and alphabet?
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
376
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
377 def __new__(cls, obj,
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
378 alphabet= generic_alphabet,
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
379 name =None, description=None,
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
380 ):
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
381 self = str.__new__(cls, obj)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
382 if alphabet is None:
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
383 alphabet = generic_alphabet
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
384 if not isinstance(alphabet, Alphabet):
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
385 alphabet = Alphabet(alphabet)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
386 if not alphabet.alphabetic(self) :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
387 raise ValueError("Sequence not alphabetic %s, '%s'" %(alphabet, self))
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
388
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
389 self._alphabet=alphabet
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
390 self.name = name
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
391 self.description = description
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
392
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
393 return self
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
394
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
395 # BEGIN PROPERTIES
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
396
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
397 # Make alphabet constant
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
398 def _get_alphabet(self):
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
399 return self._alphabet
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
400 alphabet = property(_get_alphabet)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
401
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
402 # END PROPERTIES
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
403
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
404
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
405 def ords(self) :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
406 """ Convert sequence to an array of integers
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
407 in the range [0, len(alphabet) )
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
408 """
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
409 return self.alphabet.ords(self)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
410
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
411 def tally(self, alphabet = None):
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
412 """Counts the occurrences of alphabetic characters.
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
413
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
414 Arguments:
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
415 - alphabet -- an optional alternative alphabet
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
416
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
417 Returns :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
418 A list of character counts in alphabetic order.
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
419 """
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
420 # Renamed from count() since this conflicts with str.count().
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
421 if not alphabet : alphabet = self.alphabet
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
422 L = len(alphabet)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
423 counts = [0,] * L
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
424
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
425 ords = alphabet.ords(self)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
426
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
427 for n in ords:
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
428 if n<L : counts[n] +=1
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
429
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
430 return counts
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
431
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
432
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
433 def kmers(self, alphabet = None, k=1):
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
434 """Counts the occurrences of overlapping alphabetic subsequences.
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
435
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
436 Arguments:
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
437 - alphabet -- an optional alternative alphabet
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
438 - k -- subsequence length. Default: 1 (monomers)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
439
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
440 Returns :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
441 A list of kmers counts in alphabetic order.
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
442 Status :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
443 Alpha -- Not sure on interface. Will only work for small k
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
444 """
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
445 # TODO: Refactor?
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
446 # TODO: Rename 'kmers' to 'words' or word_count
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
447 if not alphabet : alphabet = self.alphabet
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
448
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
449 L = len(alphabet)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
450 N = L**k
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
451 counts = [0,]*N
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
452
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
453 ords = alphabet.ords(self)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
454
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
455
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
456 # Easy case
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
457 if k==1 :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
458 for n in ords:
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
459 if n<N : counts[n] +=1
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
460 return counts
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
461
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
462 # Kmer counting.
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
463 # FIXME: This code assumes that k isn't too large.
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
464
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
465 # e.g. L =10, k = 3, multi = [100,10,1]
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
466 multi = [ L**i for i in range(k-1,-1,-1)]
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
467
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
468 for i in range(len(ords)-k+1) :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
469 if ords[i] >= N : # Skip non-alphabetic kmers
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
470 i += k
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
471 continue
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
472 #FIXME: this should be a function of alphabet?
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
473 n = sum([multi[j]* ords[i+j] for j in range(k) ])
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
474 counts[n] +=1
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
475
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
476 return counts
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
477
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
478 def __getslice__(self, i, j):
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
479 cls = self.__class__
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
480 return cls( str.__getslice__(self,i,j), self.alphabet)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
481
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
482 def __getitem__(self, key) :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
483 cls = self.__class__
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
484 return cls( str.__getitem__(self,key), self.alphabet)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
485
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
486 def __add__(self, other) :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
487 # called for "self + other"
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
488 cls = self.__class__
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
489 return cls( str.__add__(self, other), self.alphabet)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
490
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
491 def __radd__(self, other) :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
492 # Called when "other + self" and other is superclass of self
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
493 cls = self.__class__
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
494 return cls( str.__add__(self, other), self.alphabet)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
495
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
496 def join(self, str_list) :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
497 cls = self.__class__
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
498 return cls( super(Seq, self).join(str_list), self.alphabet)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
499
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
500 def __eq__(self, other) :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
501 if not hasattr(other, "alphabet") : return False
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
502 if self.alphabet != other.alphabet :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
503 return False
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
504 return str.__eq__(self, other)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
505
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
506 def __ne__(self, other) :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
507 return not self.__eq__(other)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
508
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
509 def tostring(self) :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
510 """ Converts Seq to a raw string.
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
511 """
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
512 # Compatibility with biopython
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
513 return str(self)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
514
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
515 # ---- Transformations of Seq ----
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
516 def reverse(self) :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
517 """Return the reversed sequence.
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
518
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
519 Not that this method returns a new object, in contrast to
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
520 the in-place reverse() method of list objects.
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
521 """
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
522 cls = self.__class__
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
523 return cls( self[::-1], self.alphabet)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
524
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
525 def ungap(self) :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
526 # FIXME: Gap symbols should be specified by the Alphabet?
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
527 return self.remove( '-.~')
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
528
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
529 def remove(self, delchars) :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
530 """Return a new alphabetic sequence with all characters in 'delchars'
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
531 removed.
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
532 """
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
533 cls = self.__class__
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
534 return cls( str(self).translate(maketrans('',''), delchars), self.alphabet)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
535
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
536 def lower(self) :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
537 """Return a lower case copy of the sequence. """
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
538 cls = self.__class__
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
539 trans = maketrans('ABCDEFGHIJKLMNOPQRSTUVWXYZ','abcdefghijklmnopqrstuvwxyz')
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
540 return cls(str(self).translate(trans), self.alphabet)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
541
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
542 def upper(self) :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
543 """Return a lower case copy of the sequence. """
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
544 cls = self.__class__
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
545 trans = maketrans('abcdefghijklmnopqrstuvwxyz','ABCDEFGHIJKLMNOPQRSTUVWXYZ')
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
546 return cls(str(self).translate(trans), self.alphabet)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
547
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
548 def mask(self, letters= 'abcdefghijklmnopqrstuvwxyz', mask='X') :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
549 """Replace all occurences of letters with the mask character.
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
550 The default is to replace all lower case letters with 'X'.
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
551 """
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
552 LL = len(letters)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
553 if len(mask) !=1 :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
554 raise ValueError("Mask should be single character")
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
555 to = mask * LL
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
556 trans = maketrans( letters, to)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
557 cls = self.__class__
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
558 return cls(str(self).translate(trans), self.alphabet)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
559
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
560 def translate(self) :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
561 """Translate a nucleotide sequence to a polypeptide using full
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
562 IUPAC ambiguities in DNA/RNA and amino acid codes, using the
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
563 standard genetic code. See corebio.transform.GeneticCode for
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
564 details and more options.
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
565 """
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
566 # Note: masks str.translate
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
567 from transform import GeneticCode
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
568 return GeneticCode.std().translate(self)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
569
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
570 def back_translate(self) :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
571 """Translate a protein sequence back into coding DNA, using using the
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
572 standard genetic code. See corebio.transform.GeneticCode for
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
573 details and more options.
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
574 """
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
575 from transform import GeneticCode
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
576 return GeneticCode.std().back_translate(self)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
577
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
578
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
579 def reverse_complement(self) :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
580 """Returns reversed complementary nucleic acid sequence (i.e. the other
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
581 strand of a DNA sequence.)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
582 """
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
583 return self.reverse().complement()
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
584
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
585 def complement(self) :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
586 """Returns complementary nucleic acid sequence."""
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
587 if not nucleic_alphabet.alphabetic(self.alphabet):
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
588 raise ValueError("Incompatable alphabets")
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
589 s = str.translate(self, _complement_table)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
590 cls = self.__class__
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
591 return cls(s, self.alphabet, self.name, self.description)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
592
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
593
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
594 # end class Seq
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
595
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
596
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
597 class SeqList(list):
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
598 """ A list of sequences.
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
599
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
600 Status:
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
601 Beta
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
602 """
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
603 # TODO: If alphabet given, we should ensure that all sequences conform.
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
604 # TODO: Need an isaligned() method. All seqs same length, same alphabet.
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
605 __slots__ =["alphabet", "name", "description"]
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
606
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
607 def __init__(self, alist=[], alphabet=None, name=None, description=None):
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
608 list.__init__(self, alist)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
609 self.alphabet = alphabet
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
610 self.name = name
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
611 self.description = description
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
612
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
613 # TOOWTDI. Replicates seq_io.read()
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
614 #@classmethod
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
615 #def read(cls, afile, alphabet = None):
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
616 # return corebio.seq_io.read(afile, alphabet)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
617 #read = classmethod(read)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
618
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
619 def ords(self, alphabet=None) :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
620 """ Convert sequence list into a 2D array of ordinals.
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
621 """
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
622 if not alphabet : alphabet = self.alphabet
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
623 if not alphabet : raise ValueError("No alphabet")
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
624 k = []
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
625 for s in self:
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
626 k.append( alphabet.ords(s) )
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
627 return k
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
628
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
629 def tally(self, alphabet = None):
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
630 """Counts the occurrences of characters in each column."""
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
631 if not alphabet : alphabet = self.alphabet
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
632 if not alphabet : raise ValueError("No alphabet")
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
633
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
634 N = len(alphabet)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
635 ords = self.ords(alphabet)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
636 L = len(ords[0])
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
637 counts = [ [0,]*N for l in range(0,L)]
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
638
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
639 for o in ords :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
640 for j,n in enumerate(o) :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
641 if n<N : counts[ j][n] +=1
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
642
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
643 return counts
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
644 # end class SeqList
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
645
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
646
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
647 def dna(string) :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
648 """Create an alphabetic sequence representing a stretch of DNA.
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
649 """
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
650 return Seq(string, alphabet = dna_alphabet)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
651
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
652 def rna(string) :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
653 """Create an alphabetic sequence representing a stretch of RNA.
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
654 """
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
655 return Seq(string, alphabet = rna_alphabet)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
656
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
657 def protein(string) :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
658 """Create an alphabetic sequence representing a stretch of polypeptide.
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
659 """
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
660 return Seq(string, alphabet = protein_alphabet)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
661
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
662
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
663
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
664
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
665