annotate translate_bed.py @ 0:038ecf54cbec draft default tip

planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
author galaxyp
date Mon, 22 Jan 2018 13:59:27 -0500
parents
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
1 #!/usr/bin/env python
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
2 """
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
3 #
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
4 #------------------------------------------------------------------------------
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
5 # University of Minnesota
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
6 # Copyright 2017, Regents of the University of Minnesota
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
7 #------------------------------------------------------------------------------
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
8 # Author:
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
9 #
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
10 # James E Johnson
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
11 #
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
12 #------------------------------------------------------------------------------
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
13 """
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
14
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
15 from __future__ import print_function
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
16
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
17 import argparse
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
18 import re
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
19 import sys
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
20
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
21 from Bio.Seq import translate
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
22
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
23 from bedutil import bed_from_line
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
24
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
25 import digest
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
26
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
27 from ensembl_rest import get_cdna
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
28
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
29 from twobitreader import TwoBitFile
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
30
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
31
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
32 def __main__():
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
33 parser = argparse.ArgumentParser(
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
34 description='Translate from BED')
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
35 parser.add_argument(
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
36 'input_bed', default=None,
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
37 help="BED to translate, '-' for stdin")
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
38 pg_seq = parser.add_argument_group('Genomic sequence source')
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
39 pg_seq.add_argument(
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
40 '-t', '--twobit', default=None,
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
41 help='Genome reference sequence in 2bit format')
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
42 pg_seq.add_argument(
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
43 '-c', '--column', type=int, default=None,
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
44 help='Column offset containing genomic sequence' +
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
45 'between start and stop (-1) for last column')
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
46 pg_out = parser.add_argument_group('Output options')
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
47 pg_out.add_argument(
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
48 '-f', '--fasta', default=None,
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
49 help='Path to output translations.fasta')
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
50 pg_out.add_argument(
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
51 '-b', '--bed', default=None,
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
52 help='Path to output translations.bed')
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
53 pg_bed = parser.add_argument_group('BED filter options')
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
54 pg_bed.add_argument(
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
55 '-E', '--ensembl', action='store_true', default=False,
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
56 help='Input BED is in 20 column Ensembl format')
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
57 pg_bed.add_argument(
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
58 '-R', '--regions', action='append', default=[],
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
59 help='Filter input by regions e.g.:'
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
60 + ' X,2:20000-25000,3:100-500+')
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
61 pg_bed.add_argument(
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
62 '-B', '--biotypes', action='append', default=[],
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
63 help='For Ensembl BED restrict translations to Ensembl biotypes')
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
64 pg_trans = parser.add_argument_group('Translation filter options')
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
65 pg_trans.add_argument(
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
66 '-m', '--min_length', type=int, default=10,
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
67 help='Minimum length of protein translation to report')
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
68 pg_trans.add_argument(
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
69 '-e', '--enzyme', default=None,
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
70 help='Digest translation with enzyme')
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
71 pg_trans.add_argument(
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
72 '-M', '--start_codon', action='store_true', default=False,
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
73 help='Trim translations to methionine start_codon')
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
74 pg_trans.add_argument(
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
75 '-C', '--cds', action='store_true', default=False,
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
76 help='Only translate CDS')
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
77 pg_trans.add_argument(
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
78 '-A', '--all', action='store_true',
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
79 help='Include CDS protein translations ')
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
80 pg_fmt = parser.add_argument_group('ID format options')
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
81 pg_fmt.add_argument(
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
82 '-r', '--reference', default='',
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
83 help='Genome Reference Name')
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
84 pg_fmt.add_argument(
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
85 '-D', '--fa_db', dest='fa_db', default=None,
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
86 help='Prefix DB identifier for fasta ID line, e.g. generic')
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
87 pg_fmt.add_argument(
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
88 '-s', '--fa_sep', dest='fa_sep', default='|',
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
89 help='fasta ID separator defaults to pipe char, ' +
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
90 'e.g. generic|ProtID|description')
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
91 pg_fmt.add_argument(
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
92 '-P', '--id_prefix', default='',
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
93 help='prefix for the sequence ID')
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
94 parser.add_argument('-v', '--verbose', action='store_true', help='Verbose')
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
95 parser.add_argument('-d', '--debug', action='store_true', help='Debug')
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
96 args = parser.parse_args()
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
97
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
98 input_rdr = open(args.input_bed, 'r')\
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
99 if args.input_bed != '-' else sys.stdin
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
100 fa_wtr = open(args.fasta, 'w')\
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
101 if args.fasta is not None and args.fasta != '-' else sys.stdout
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
102 bed_wtr = open(args.bed, 'w') if args.bed is not None else None
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
103
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
104 enzyme = digest.expasy_rules.get(args.enzyme, None)
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
105
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
106 biotypea = [bt.strip() for biotype in args.biotypes
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
107 for bt in biotype.split(',')]
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
108
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
109 twobit = TwoBitFile(args.twobit) if args.twobit else None
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
110
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
111 selected_regions = dict() # chrom:(start, end)
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
112 region_pat = '^(?:chr)?([^:]+)(?::(\d*)(?:-(\d+)([+-])?)?)?'
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
113 if args.regions:
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
114 for entry in args.regions:
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
115 if not entry:
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
116 continue
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
117 regs = [x.strip() for x in entry.split(',') if x.strip()]
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
118 for reg in regs:
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
119 m = re.match(region_pat, reg)
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
120 if m:
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
121 (chrom, start, end, strand) = m.groups()
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
122 if chrom:
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
123 if chrom not in selected_regions:
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
124 selected_regions[chrom] = []
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
125 selected_regions[chrom].append([start, end, strand])
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
126 if args.debug:
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
127 print("selected_regions: %s" % selected_regions, file=sys.stderr)
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
128
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
129 def filter_by_regions(bed):
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
130 if not selected_regions:
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
131 return True
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
132 ref = re.sub('^(?i)chr', '', bed.chrom)
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
133 if ref not in selected_regions:
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
134 return False
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
135 for reg in selected_regions[ref]:
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
136 (_start, _stop, _strand) = reg
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
137 start = int(_start) if _start else 0
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
138 stop = int(_stop) if _stop else sys.maxint
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
139 if _strand and bed.strand != _strand:
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
140 continue
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
141 if bed.chromEnd >= start and bed.chromStart <= stop:
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
142 return True
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
143 return False
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
144
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
145 translations = dict() # start : end : seq
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
146
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
147 def unique_prot(tbed, seq):
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
148 if tbed.chromStart not in translations:
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
149 translations[tbed.chromStart] = dict()
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
150 translations[tbed.chromStart][tbed.chromEnd] = []
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
151 translations[tbed.chromStart][tbed.chromEnd].append(seq)
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
152 elif tbed.chromEnd not in translations[tbed.chromStart]:
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
153 translations[tbed.chromStart][tbed.chromEnd] = []
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
154 translations[tbed.chromStart][tbed.chromEnd].append(seq)
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
155 elif seq not in translations[tbed.chromStart][tbed.chromEnd]:
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
156 translations[tbed.chromStart][tbed.chromEnd].append(seq)
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
157 else:
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
158 return False
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
159 return True
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
160
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
161 def get_sequence(chrom, start, end):
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
162 if twobit:
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
163 if chrom in twobit and 0 <= start < end < len(twobit[chrom]):
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
164 return twobit[chrom][start:end]
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
165 contig = chrom[3:] if chrom.startswith('chr') else 'chr%s' % chrom
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
166 if contig in twobit and 0 <= start < end < len(twobit[contig]):
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
167 return twobit[contig][start:end]
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
168 return None
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
169
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
170 def write_translation(tbed, accession, peptide):
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
171 if args.id_prefix:
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
172 tbed.name = "%s%s" % (args.id_prefix, tbed.name)
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
173 probed = "%s\t%s\t%s\t%s%s" % (accession, peptide,
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
174 'unique', args.reference,
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
175 '\t.' * 9)
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
176 if bed_wtr:
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
177 bed_wtr.write("%s\t%s\n" % (str(tbed), probed))
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
178 bed_wtr.flush()
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
179 location = "chromosome:%s:%s:%s:%s:%s"\
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
180 % (args.reference, tbed.chrom,
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
181 tbed.thickStart, tbed.thickEnd, tbed.strand)
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
182 fa_desc = '%s%s' % (args.fa_sep, location)
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
183 fa_db = '%s%s' % (args.fa_db, args.fa_sep) if args.fa_db else ''
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
184 fa_id = ">%s%s%s\n" % (fa_db, tbed.name, fa_desc)
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
185 fa_wtr.write(fa_id)
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
186 fa_wtr.write(peptide)
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
187 fa_wtr.write("\n")
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
188 fa_wtr.flush()
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
189
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
190 def translate_bed(bed):
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
191 translate_count = 0
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
192 transcript_id = bed.name
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
193 refprot = None
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
194 if not bed.seq:
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
195 if twobit:
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
196 bed.seq = get_sequence(bed.chrom, bed.chromStart, bed.chromEnd)
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
197 else:
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
198 bed.cdna = get_cdna(transcript_id)
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
199 cdna = bed.get_cdna()
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
200 if cdna is not None:
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
201 cdna_len = len(cdna)
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
202 if args.cds or args.all:
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
203 try:
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
204 cds = bed.get_cds()
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
205 if cds:
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
206 if args.debug:
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
207 print("cdna:%s" % str(cdna), file=sys.stderr)
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
208 print("cds: %s" % str(cds), file=sys.stderr)
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
209 if len(cds) % 3 != 0:
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
210 cds = cds[:-(len(cds) % 3)]
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
211 refprot = translate(cds) if cds else None
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
212 except:
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
213 refprot = None
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
214 if args.cds:
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
215 if refprot:
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
216 tbed = bed.get_cds_bed()
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
217 if args.start_codon:
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
218 m = refprot.find('M')
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
219 if m < 0:
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
220 return 0
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
221 elif m > 0:
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
222 bed.trim_cds(m*3)
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
223 refprot = refprot[m:]
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
224 stop = refprot.find('*')
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
225 if stop >= 0:
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
226 bed.trim_cds((stop - len(refprot)) * 3)
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
227 refprot = refprot[:stop]
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
228 if len(refprot) >= args.min_length:
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
229 write_translation(tbed, bed.name, refprot)
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
230 return 1
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
231 return 0
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
232 if args.debug:
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
233 print("%s\n" % (str(bed)), file=sys.stderr)
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
234 print("CDS: %s %d %d" %
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
235 (bed.strand, bed.cdna_offset_of_pos(bed.thickStart),
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
236 bed.cdna_offset_of_pos(bed.thickEnd)),
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
237 file=sys.stderr)
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
238 print("refprot: %s" % str(refprot), file=sys.stderr)
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
239 for offset in range(3):
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
240 seqend = cdna_len - (cdna_len - offset) % 3
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
241 aaseq = translate(cdna[offset:seqend])
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
242 aa_start = 0
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
243 while aa_start < len(aaseq):
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
244 aa_end = aaseq.find('*', aa_start)
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
245 if aa_end < 0:
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
246 aa_end = len(aaseq)
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
247 prot = aaseq[aa_start:aa_end]
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
248 if args.start_codon:
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
249 m = prot.find('M')
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
250 aa_start += m if m >= 0 else aa_end
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
251 prot = aaseq[aa_start:aa_end]
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
252 if enzyme and refprot:
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
253 frags = digest._cleave(prot, enzyme)
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
254 for frag in reversed(frags):
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
255 if frag in refprot:
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
256 prot = prot[:prot.rfind(frag)]
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
257 else:
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
258 break
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
259 is_cds = refprot and prot in refprot
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
260 if args.debug:
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
261 print("is_cds: %s %s" % (str(is_cds), str(prot)),
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
262 file=sys.stderr)
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
263 if len(prot) < args.min_length:
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
264 pass
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
265 elif not args.all and is_cds:
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
266 pass
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
267 else:
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
268 tstart = aa_start*3+offset
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
269 tend = aa_end*3+offset
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
270 prot_acc = "%s_%d_%d" % (transcript_id, tstart, tend)
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
271 tbed = bed.trim(tstart, tend)
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
272 if args.all or unique_prot(tbed, prot):
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
273 translate_count += 1
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
274 tbed.name = prot_acc
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
275 write_translation(tbed, bed.name, prot)
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
276 aa_start = aa_end + 1
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
277 return translate_count
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
278
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
279 if input_rdr:
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
280 translation_count = 0
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
281 transcript_count = 0
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
282 for i, bedline in enumerate(input_rdr):
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
283 try:
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
284 bed = bed_from_line(bedline, ensembl=args.ensembl,
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
285 seq_column=args.column)
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
286 if bed is None:
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
287 continue
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
288 transcript_count += 1
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
289 if bed.biotype and biotypea and bed.biotype not in biotypea:
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
290 continue
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
291 if filter_by_regions(bed):
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
292 translation_count += translate_bed(bed)
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
293 except Exception as e:
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
294 print("BED format Error: line %d: %s\n%s"
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
295 % (i, bedline, e), file=sys.stderr)
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
296 break
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
297 if args.debug or args.verbose:
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
298 print("transcripts: %d\ttranslations: %d"
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
299 % (transcript_count, translation_count), file=sys.stderr)
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
300
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
301
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
302 if __name__ == "__main__":
038ecf54cbec planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/translate_bed commit 383bb485120a193bcc14f88364e51356d6ede219
galaxyp
parents:
diff changeset
303 __main__()