Mercurial > repos > galaxyp > translate_bed
annotate digest.py @ 0:038ecf54cbec draft default tip
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
author | galaxyp |
---|---|
date | Mon, 22 Jan 2018 13:59:27 -0500 |
parents | |
children |
rev | line source |
---|---|
0
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
1 # Copyright 2012 Anton Goloborodko, Lev Levitsky |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
2 # |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
3 # Licensed under the Apache License, Version 2.0 (the "License"); |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
4 # you may not use this file except in compliance with the License. |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
5 # You may obtain a copy of the License at |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
6 # |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
7 # http://www.apache.org/licenses/LICENSE-2.0 |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
8 # |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
9 # Unless required by applicable law or agreed to in writing, software |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
10 # distributed under the License is distributed on an "AS IS" BASIS, |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
11 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
12 # See the License for the specific language governing permissions and |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
13 # limitations under the License. |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
14 |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
15 import itertools as it |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
16 import re |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
17 from collections import deque |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
18 |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
19 |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
20 def cleave(sequence, rule, missed_cleavages=0, min_length=None): |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
21 """Cleaves a polypeptide sequence using a given rule. |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
22 |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
23 Parameters |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
24 ---------- |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
25 sequence : str |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
26 The sequence of a polypeptide. |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
27 |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
28 .. note:: |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
29 The sequence is expected to be in one-letter uppercase notation. |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
30 Otherwise, some of the cleavage rules in :py:data:`expasy_rules` |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
31 will not work as expected. |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
32 |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
33 rule : str or compiled regex |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
34 A regular expression describing the site of cleavage. It is recommended |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
35 to design the regex so that it matches only the residue whose |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
36 C-terminal bond is to be cleaved. All additional requirements should be |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
37 specified using `lookaround assertions |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
38 <http://www.regular-expressions.info/lookaround.html>`_. |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
39 :py:data:`expasy_rules` contains cleavage rules |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
40 for popular cleavage agents. |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
41 missed_cleavages : int, optional |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
42 Maximum number of allowed missed cleavages. Defaults to 0. |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
43 min_length : int or None, optional |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
44 Minimum peptide length. Defaults to :py:const:`None`. |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
45 |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
46 ..note :: |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
47 This checks for string length, which is only correct for one-letter |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
48 notation and not for full *modX*. Use :py:func:`length` manually if |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
49 you know what you are doing and apply :py:func:`cleave` to *modX* |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
50 sequences. |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
51 |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
52 Returns |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
53 ------- |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
54 out : set |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
55 A set of unique (!) peptides. |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
56 |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
57 Examples |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
58 -------- |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
59 >>> cleave('AKAKBK', expasy_rules['trypsin'], 0) == {'AK', 'BK'} |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
60 True |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
61 >>> cleave('GKGKYKCK', expasy_rules['trypsin'], 2) == \ |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
62 {'CK', 'GKYK', 'YKCK', 'GKGK', 'GKYKCK', 'GK', 'GKGKYK', 'YK'} |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
63 True |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
64 |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
65 """ |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
66 return set(_cleave(sequence, rule, missed_cleavages, min_length)) |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
67 |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
68 |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
69 def _cleave(sequence, rule, missed_cleavages=0, min_length=None): |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
70 """Like :py:func:`cleave`, but the result is a list. Refer to |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
71 :py:func:`cleave` for explanation of parameters. |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
72 """ |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
73 peptides = [] |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
74 ml = missed_cleavages+2 |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
75 trange = range(ml) |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
76 cleavage_sites = deque([0], maxlen=ml) |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
77 cl = 1 |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
78 for i in it.chain([x.end() for x in re.finditer(rule, sequence)], |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
79 [None]): |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
80 cleavage_sites.append(i) |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
81 if cl < ml: |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
82 cl += 1 |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
83 for j in trange[:cl-1]: |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
84 seq = sequence[cleavage_sites[j]:cleavage_sites[-1]] |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
85 if seq: |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
86 if min_length is None or len(seq) >= min_length: |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
87 peptides.append(seq) |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
88 return peptides |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
89 |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
90 |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
91 def num_sites(sequence, rule, **kwargs): |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
92 """Count the number of sites where `sequence` can be cleaved using |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
93 the given `rule` (e.g. number of miscleavages for a peptide). |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
94 |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
95 Parameters |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
96 ---------- |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
97 sequence : str |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
98 The sequence of a polypeptide. |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
99 rule : str or compiled regex |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
100 A regular expression describing the site of cleavage. It is recommended |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
101 to design the regex so that it matches only the residue whose |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
102 C-terminal bond is to be cleaved. All additional requirements should be |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
103 specified using `lookaround assertions |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
104 <http://www.regular-expressions.info/lookaround.html>`_. |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
105 labels : list, optional |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
106 A list of allowed labels for amino acids and terminal modifications. |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
107 |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
108 Returns |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
109 ------- |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
110 out : int |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
111 Number of cleavage sites. |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
112 """ |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
113 return len(_cleave(sequence, rule, **kwargs)) - 1 |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
114 |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
115 |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
116 expasy_rules = { |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
117 'arg-c': r'R', |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
118 'asp-n': r'\w(?=D)', |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
119 'bnps-skatole': r'W', |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
120 'caspase 1': r'(?<=[FWYL]\w[HAT])D(?=[^PEDQKR])', |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
121 'caspase 2': r'(?<=DVA)D(?=[^PEDQKR])', |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
122 'caspase 3': r'(?<=DMQ)D(?=[^PEDQKR])', |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
123 'caspase 4': r'(?<=LEV)D(?=[^PEDQKR])', |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
124 'caspase 5': r'(?<=[LW]EH)D', |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
125 'caspase 6': r'(?<=VE[HI])D(?=[^PEDQKR])', |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
126 'caspase 7': r'(?<=DEV)D(?=[^PEDQKR])', |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
127 'caspase 8': r'(?<=[IL]ET)D(?=[^PEDQKR])', |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
128 'caspase 9': r'(?<=LEH)D', |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
129 'caspase 10': r'(?<=IEA)D', |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
130 'chymotrypsin high specificity': r'([FY](?=[^P]))|(W(?=[^MP]))', |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
131 'chymotrypsin low specificity': |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
132 r'([FLY](?=[^P]))|(W(?=[^MP]))|(M(?=[^PY]))|(H(?=[^DMPW]))', |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
133 'clostripain': r'R', |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
134 'cnbr': r'M', |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
135 'enterokinase': r'(?<=[DE]{3})K', |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
136 'factor xa': r'(?<=[AFGILTVM][DE]G)R', |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
137 'formic acid': r'D', |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
138 'glutamyl endopeptidase': r'E', |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
139 'granzyme b': r'(?<=IEP)D', |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
140 'hydroxylamine': r'N(?=G)', |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
141 'iodosobenzoic acid': r'W', |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
142 'lysc': r'K', |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
143 'ntcb': r'\w(?=C)', |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
144 'pepsin ph1.3': r'((?<=[^HKR][^P])[^R](?=[FLWY][^P]))|' |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
145 r'((?<=[^HKR][^P])[FLWY](?=\w[^P]))', |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
146 'pepsin ph2.0': r'((?<=[^HKR][^P])[^R](?=[FL][^P]))|' |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
147 r'((?<=[^HKR][^P])[FL](?=\w[^P]))', |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
148 'proline endopeptidase': r'(?<=[HKR])P(?=[^P])', |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
149 'proteinase k': r'[AEFILTVWY]', |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
150 'staphylococcal peptidase i': r'(?<=[^E])E', |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
151 'thermolysin': r'[^DE](?=[AFILMV])', |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
152 'thrombin': r'((?<=G)R(?=G))|' |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
153 r'((?<=[AFGILTVM][AFGILTVWA]P)R(?=[^DE][^DE]))', |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
154 'trypsin': r'([KR](?=[^P]))|((?<=W)K(?=P))|((?<=M)R(?=P))' |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
155 } |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
156 """ |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
157 This dict contains regular expressions for cleavage rules of the most |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
158 popular proteolytic enzymes. The rules were taken from the |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
159 `PeptideCutter tool |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
160 <http://ca.expasy.org/tools/peptidecutter/peptidecutter_enzymes.html>`_ |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
161 at Expasy. |
038ecf54cbec
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff
changeset
|
162 """ |