annotate digest.py @ 0:038ecf54cbec draft default tip

planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
author galaxyp
date Mon, 22 Jan 2018 13:59:27 -0500
parents
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
1 # Copyright 2012 Anton Goloborodko, Lev Levitsky
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
2 #
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
3 # Licensed under the Apache License, Version 2.0 (the "License");
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
4 # you may not use this file except in compliance with the License.
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
5 # You may obtain a copy of the License at
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
6 #
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
7 # http://www.apache.org/licenses/LICENSE-2.0
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
8 #
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
9 # Unless required by applicable law or agreed to in writing, software
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
10 # distributed under the License is distributed on an "AS IS" BASIS,
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
11 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
12 # See the License for the specific language governing permissions and
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
13 # limitations under the License.
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
14
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
15 import itertools as it
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
16 import re
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
17 from collections import deque
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
18
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
19
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
20 def cleave(sequence, rule, missed_cleavages=0, min_length=None):
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
21 """Cleaves a polypeptide sequence using a given rule.
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
22
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
23 Parameters
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
24 ----------
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
25 sequence : str
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
26 The sequence of a polypeptide.
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
27
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
28 .. note::
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
29 The sequence is expected to be in one-letter uppercase notation.
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
30 Otherwise, some of the cleavage rules in :py:data:`expasy_rules`
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
31 will not work as expected.
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
32
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
33 rule : str or compiled regex
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
34 A regular expression describing the site of cleavage. It is recommended
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
35 to design the regex so that it matches only the residue whose
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
36 C-terminal bond is to be cleaved. All additional requirements should be
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
37 specified using `lookaround assertions
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
38 <http://www.regular-expressions.info/lookaround.html>`_.
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
39 :py:data:`expasy_rules` contains cleavage rules
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
40 for popular cleavage agents.
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
41 missed_cleavages : int, optional
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
42 Maximum number of allowed missed cleavages. Defaults to 0.
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
43 min_length : int or None, optional
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
44 Minimum peptide length. Defaults to :py:const:`None`.
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
45
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
46 ..note ::
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
47 This checks for string length, which is only correct for one-letter
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
48 notation and not for full *modX*. Use :py:func:`length` manually if
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
49 you know what you are doing and apply :py:func:`cleave` to *modX*
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
50 sequences.
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
51
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
52 Returns
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
53 -------
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
54 out : set
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
55 A set of unique (!) peptides.
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
56
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
57 Examples
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
58 --------
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
59 >>> cleave('AKAKBK', expasy_rules['trypsin'], 0) == {'AK', 'BK'}
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
60 True
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
61 >>> cleave('GKGKYKCK', expasy_rules['trypsin'], 2) == \
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
62 {'CK', 'GKYK', 'YKCK', 'GKGK', 'GKYKCK', 'GK', 'GKGKYK', 'YK'}
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
63 True
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
64
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
65 """
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
66 return set(_cleave(sequence, rule, missed_cleavages, min_length))
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
67
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
68
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
69 def _cleave(sequence, rule, missed_cleavages=0, min_length=None):
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
70 """Like :py:func:`cleave`, but the result is a list. Refer to
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
71 :py:func:`cleave` for explanation of parameters.
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
72 """
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
73 peptides = []
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
74 ml = missed_cleavages+2
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
75 trange = range(ml)
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
76 cleavage_sites = deque([0], maxlen=ml)
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
77 cl = 1
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
78 for i in it.chain([x.end() for x in re.finditer(rule, sequence)],
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
79 [None]):
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
80 cleavage_sites.append(i)
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
81 if cl < ml:
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
82 cl += 1
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
83 for j in trange[:cl-1]:
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
84 seq = sequence[cleavage_sites[j]:cleavage_sites[-1]]
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
85 if seq:
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
86 if min_length is None or len(seq) >= min_length:
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
87 peptides.append(seq)
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
88 return peptides
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
89
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
90
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
91 def num_sites(sequence, rule, **kwargs):
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
92 """Count the number of sites where `sequence` can be cleaved using
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
93 the given `rule` (e.g. number of miscleavages for a peptide).
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
94
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
95 Parameters
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
96 ----------
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
97 sequence : str
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
98 The sequence of a polypeptide.
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
99 rule : str or compiled regex
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
100 A regular expression describing the site of cleavage. It is recommended
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
101 to design the regex so that it matches only the residue whose
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
102 C-terminal bond is to be cleaved. All additional requirements should be
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
103 specified using `lookaround assertions
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
104 <http://www.regular-expressions.info/lookaround.html>`_.
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
105 labels : list, optional
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
106 A list of allowed labels for amino acids and terminal modifications.
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
107
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
108 Returns
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
109 -------
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
110 out : int
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
111 Number of cleavage sites.
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
112 """
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
113 return len(_cleave(sequence, rule, **kwargs)) - 1
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
114
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
115
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
116 expasy_rules = {
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
117 'arg-c': r'R',
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
118 'asp-n': r'\w(?=D)',
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
119 'bnps-skatole': r'W',
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
120 'caspase 1': r'(?<=[FWYL]\w[HAT])D(?=[^PEDQKR])',
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
121 'caspase 2': r'(?<=DVA)D(?=[^PEDQKR])',
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
122 'caspase 3': r'(?<=DMQ)D(?=[^PEDQKR])',
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
123 'caspase 4': r'(?<=LEV)D(?=[^PEDQKR])',
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
124 'caspase 5': r'(?<=[LW]EH)D',
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
125 'caspase 6': r'(?<=VE[HI])D(?=[^PEDQKR])',
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
126 'caspase 7': r'(?<=DEV)D(?=[^PEDQKR])',
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
127 'caspase 8': r'(?<=[IL]ET)D(?=[^PEDQKR])',
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
128 'caspase 9': r'(?<=LEH)D',
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
129 'caspase 10': r'(?<=IEA)D',
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
130 'chymotrypsin high specificity': r'([FY](?=[^P]))|(W(?=[^MP]))',
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
131 'chymotrypsin low specificity':
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
132 r'([FLY](?=[^P]))|(W(?=[^MP]))|(M(?=[^PY]))|(H(?=[^DMPW]))',
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
133 'clostripain': r'R',
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
134 'cnbr': r'M',
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
135 'enterokinase': r'(?<=[DE]{3})K',
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
136 'factor xa': r'(?<=[AFGILTVM][DE]G)R',
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
137 'formic acid': r'D',
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
138 'glutamyl endopeptidase': r'E',
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
139 'granzyme b': r'(?<=IEP)D',
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
140 'hydroxylamine': r'N(?=G)',
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
141 'iodosobenzoic acid': r'W',
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
142 'lysc': r'K',
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
143 'ntcb': r'\w(?=C)',
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
144 'pepsin ph1.3': r'((?<=[^HKR][^P])[^R](?=[FLWY][^P]))|'
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
145 r'((?<=[^HKR][^P])[FLWY](?=\w[^P]))',
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
146 'pepsin ph2.0': r'((?<=[^HKR][^P])[^R](?=[FL][^P]))|'
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
147 r'((?<=[^HKR][^P])[FL](?=\w[^P]))',
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
148 'proline endopeptidase': r'(?<=[HKR])P(?=[^P])',
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
149 'proteinase k': r'[AEFILTVWY]',
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
150 'staphylococcal peptidase i': r'(?<=[^E])E',
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
151 'thermolysin': r'[^DE](?=[AFILMV])',
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
152 'thrombin': r'((?<=G)R(?=G))|'
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
153 r'((?<=[AFGILTVM][AFGILTVWA]P)R(?=[^DE][^DE]))',
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
154 'trypsin': r'([KR](?=[^P]))|((?<=W)K(?=P))|((?<=M)R(?=P))'
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
155 }
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
156 """
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
157 This dict contains regular expressions for cleavage rules of the most
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
158 popular proteolytic enzymes. The rules were taken from the
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
159 `PeptideCutter tool
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
160 <http://ca.expasy.org/tools/peptidecutter/peptidecutter_enzymes.html>`_
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
161 at Expasy.
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
162 """