Mercurial > repos > galaxyp > retrieve_ensembl_bed
annotate retrieve_ensembl_bed.py @ 1:9c4a48f5d4e7 draft default tip
"planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 6babd357845126292cb202aaea0f70ff68819525"
author | galaxyp |
---|---|
date | Mon, 07 Oct 2019 16:14:39 -0400 |
parents | da1b538b87e5 |
children |
rev | line source |
---|---|
0
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
1 #!/usr/bin/env python |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
2 """ |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
3 # |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
4 #------------------------------------------------------------------------------ |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
5 # University of Minnesota |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
6 # Copyright 2017, Regents of the University of Minnesota |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
7 #------------------------------------------------------------------------------ |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
8 # Author: |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
9 # |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
10 # James E Johnson |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
11 # |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
12 #------------------------------------------------------------------------------ |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
13 """ |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
14 |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
15 from __future__ import print_function |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
16 |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
17 import argparse |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
18 import re |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
19 import sys |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
20 |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
21 from bedutil import bed_from_line |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
22 |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
23 from ensembl_rest import get_toplevel, get_transcripts_bed, max_region |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
24 |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
25 |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
26 def __main__(): |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
27 parser = argparse.ArgumentParser( |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
28 description='Retrieve Ensembl cDNAs in BED format') |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
29 parser.add_argument( |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
30 '-s', '--species', default='human', |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
31 help='Ensembl Species to retrieve') |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
32 parser.add_argument( |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
33 '-R', '--regions', action='append', default=[], |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
34 help='Restrict Ensembl retrieval to regions e.g.:' |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
35 + ' X,2:20000-25000,3:100-500+') |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
36 parser.add_argument( |
1
9c4a48f5d4e7
"planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 6babd357845126292cb202aaea0f70ff68819525"
galaxyp
parents:
0
diff
changeset
|
37 '-i', '--interval_file', default=None, |
9c4a48f5d4e7
"planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 6babd357845126292cb202aaea0f70ff68819525"
galaxyp
parents:
0
diff
changeset
|
38 help='Regions from a bed, gff, or interval file') |
9c4a48f5d4e7
"planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 6babd357845126292cb202aaea0f70ff68819525"
galaxyp
parents:
0
diff
changeset
|
39 parser.add_argument( |
9c4a48f5d4e7
"planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 6babd357845126292cb202aaea0f70ff68819525"
galaxyp
parents:
0
diff
changeset
|
40 '-f', '--interval_format', choices=['bed','gff','interval'], default='interval', |
9c4a48f5d4e7
"planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 6babd357845126292cb202aaea0f70ff68819525"
galaxyp
parents:
0
diff
changeset
|
41 help='Interval format has TAB-separated columns: Seq, Start, End, Strand') |
9c4a48f5d4e7
"planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 6babd357845126292cb202aaea0f70ff68819525"
galaxyp
parents:
0
diff
changeset
|
42 parser.add_argument( |
0
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
43 '-B', '--biotypes', action='append', default=[], |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
44 help='Restrict Ensembl biotypes to retrieve') |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
45 parser.add_argument( |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
46 '-X', '--extended_bed', action='store_true', default=False, |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
47 help='Include the extended columns returned from Ensembl') |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
48 parser.add_argument( |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
49 '-U', '--ucsc_chrom_names', action='store_true', default=False, |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
50 help='Use the UCSC names for Chromosomes') |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
51 parser.add_argument( |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
52 '-t', '--toplevel', action='store_true', |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
53 help='Print Ensembl toplevel for species') |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
54 parser.add_argument( |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
55 'output', |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
56 help='Output BED filepath, or for stdout: "-"') |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
57 parser.add_argument('-v', '--verbose', action='store_true', help='Verbose') |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
58 parser.add_argument('-d', '--debug', action='store_true', help='Debug') |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
59 args = parser.parse_args() |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
60 species = args.species |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
61 out_wtr = open(args.output, 'w') if args.output != '-' else sys.stdout |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
62 biotypes = ';'.join(['biotype=%s' % bt.strip() |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
63 for biotype in args.biotypes |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
64 for bt in biotype.split(',') if bt.strip()]) |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
65 |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
66 selected_regions = dict() # chrom:(start, end) |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
67 region_pat = '^([^:]+)(?::(\d*)(?:-(\d+)([+-])?)?)?' |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
68 if args.regions: |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
69 for entry in args.regions: |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
70 if not entry: |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
71 continue |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
72 regs = [x.strip() for x in entry.split(',') if x.strip()] |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
73 for reg in regs: |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
74 m = re.match(region_pat, reg) |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
75 if m: |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
76 (chrom, start, end, strand) = m.groups() |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
77 if chrom: |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
78 if chrom not in selected_regions: |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
79 selected_regions[chrom] = [] |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
80 selected_regions[chrom].append([start, end, strand]) |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
81 if args.debug: |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
82 print("selected_regions: %s" % selected_regions, file=sys.stderr) |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
83 |
1
9c4a48f5d4e7
"planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 6babd357845126292cb202aaea0f70ff68819525"
galaxyp
parents:
0
diff
changeset
|
84 if args.interval_file: |
9c4a48f5d4e7
"planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 6babd357845126292cb202aaea0f70ff68819525"
galaxyp
parents:
0
diff
changeset
|
85 pat = r'^(?:chr)?([^\t]+)(?:\t(\d+)(?:\t(\d+)(?:\t([+-])?)?)?)?.*' |
9c4a48f5d4e7
"planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 6babd357845126292cb202aaea0f70ff68819525"
galaxyp
parents:
0
diff
changeset
|
86 if args.interval_format == 'bed': |
9c4a48f5d4e7
"planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 6babd357845126292cb202aaea0f70ff68819525"
galaxyp
parents:
0
diff
changeset
|
87 pat = r'^(?:chr)?([^\t]+)\t(\d+)\t(\d+)(?:(?:\t[^\t]+\t[^\t]+\t)([+-]))?.*' |
9c4a48f5d4e7
"planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 6babd357845126292cb202aaea0f70ff68819525"
galaxyp
parents:
0
diff
changeset
|
88 elif args.interval_format == 'gff': |
9c4a48f5d4e7
"planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 6babd357845126292cb202aaea0f70ff68819525"
galaxyp
parents:
0
diff
changeset
|
89 pat = r'^(?:chr)?([^\t]+)\t(\d+)\t(\d+)(?:(?:\t[^\t]+\t[^\t]+\t)([+-]))?.*' |
9c4a48f5d4e7
"planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 6babd357845126292cb202aaea0f70ff68819525"
galaxyp
parents:
0
diff
changeset
|
90 with open(args.interval_file,'r') as fh: |
9c4a48f5d4e7
"planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 6babd357845126292cb202aaea0f70ff68819525"
galaxyp
parents:
0
diff
changeset
|
91 for i, line in enumerate(fh): |
9c4a48f5d4e7
"planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 6babd357845126292cb202aaea0f70ff68819525"
galaxyp
parents:
0
diff
changeset
|
92 if line.startswith('#'): |
9c4a48f5d4e7
"planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 6babd357845126292cb202aaea0f70ff68819525"
galaxyp
parents:
0
diff
changeset
|
93 continue |
9c4a48f5d4e7
"planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 6babd357845126292cb202aaea0f70ff68819525"
galaxyp
parents:
0
diff
changeset
|
94 m = re.match(pat, line.rstrip()) |
9c4a48f5d4e7
"planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 6babd357845126292cb202aaea0f70ff68819525"
galaxyp
parents:
0
diff
changeset
|
95 if m: |
9c4a48f5d4e7
"planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 6babd357845126292cb202aaea0f70ff68819525"
galaxyp
parents:
0
diff
changeset
|
96 (chrom, start, end, strand) = m.groups() |
9c4a48f5d4e7
"planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 6babd357845126292cb202aaea0f70ff68819525"
galaxyp
parents:
0
diff
changeset
|
97 if chrom: |
9c4a48f5d4e7
"planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 6babd357845126292cb202aaea0f70ff68819525"
galaxyp
parents:
0
diff
changeset
|
98 if chrom not in selected_regions: |
9c4a48f5d4e7
"planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 6babd357845126292cb202aaea0f70ff68819525"
galaxyp
parents:
0
diff
changeset
|
99 selected_regions[chrom] = [] |
9c4a48f5d4e7
"planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 6babd357845126292cb202aaea0f70ff68819525"
galaxyp
parents:
0
diff
changeset
|
100 selected_regions[chrom].append([start, end, strand]) |
9c4a48f5d4e7
"planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 6babd357845126292cb202aaea0f70ff68819525"
galaxyp
parents:
0
diff
changeset
|
101 if args.debug: |
9c4a48f5d4e7
"planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 6babd357845126292cb202aaea0f70ff68819525"
galaxyp
parents:
0
diff
changeset
|
102 print("selected_regions: %s" % selected_regions, file=sys.stderr) |
9c4a48f5d4e7
"planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 6babd357845126292cb202aaea0f70ff68819525"
galaxyp
parents:
0
diff
changeset
|
103 |
9c4a48f5d4e7
"planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 6babd357845126292cb202aaea0f70ff68819525"
galaxyp
parents:
0
diff
changeset
|
104 |
0
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
105 def retrieve_region(species, ref, start, stop, strand): |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
106 transcript_count = 0 |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
107 regions = list(range(start, stop, max_region)) |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
108 if not regions or regions[-1] < stop: |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
109 regions.append(stop) |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
110 for end in regions[1:]: |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
111 bedlines = get_transcripts_bed(species, ref, start, end, |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
112 strand=strand, params=biotypes) |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
113 if args.debug: |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
114 print("%s\t%s\tstart: %d\tend: %d\tcDNA transcripts:%d" % |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
115 (species, ref, start, end, len(bedlines)), |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
116 file=sys.stderr) |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
117 # start, end, seq |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
118 for i, bedline in enumerate(bedlines): |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
119 if args.debug: |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
120 print("%s\n" % (bedline), file=sys.stderr) |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
121 if not args.ucsc_chrom_names: |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
122 bedline = re.sub('^[^\t]+', ref, bedline) |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
123 try: |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
124 if out_wtr: |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
125 out_wtr.write(bedline.replace(',\t', '\t') |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
126 if args.extended_bed |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
127 else str(bed_from_line(bedline))) |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
128 out_wtr.write("\n") |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
129 out_wtr.flush() |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
130 except Exception as e: |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
131 print("BED error (%s) : %s\n" % (e, bedline), |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
132 file=sys.stderr) |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
133 start = end + 1 |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
134 return transcript_count |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
135 |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
136 coord_systems = get_toplevel(species) |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
137 if 'chromosome' in coord_systems: |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
138 ref_lengths = dict() |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
139 for ref in sorted(coord_systems['chromosome'].keys()): |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
140 length = coord_systems['chromosome'][ref] |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
141 ref_lengths[ref] = length |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
142 if args.toplevel: |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
143 print("%s\t%s\tlength: %d" % (species, ref, length), |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
144 file=sys.stderr) |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
145 if selected_regions: |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
146 transcript_count = 0 |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
147 for ref in sorted(selected_regions.keys()): |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
148 if ref in ref_lengths: |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
149 for reg in selected_regions[ref]: |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
150 (_start, _stop, _strand) = reg |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
151 start = int(_start) if _start else 0 |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
152 stop = int(_stop) if _stop else ref_lengths[ref] |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
153 strand = '' if not _strand else ':1'\ |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
154 if _strand == '+' else ':-1' |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
155 transcript_count += retrieve_region(species, ref, |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
156 start, stop, |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
157 strand) |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
158 if args.debug or args.verbose: |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
159 length = stop - start |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
160 print("%s\t%s:%d-%d%s\tlength: %d\ttrancripts:%d" % |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
161 (species, ref, start, stop, strand, |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
162 length, transcript_count), |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
163 file=sys.stderr) |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
164 else: |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
165 strand = '' |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
166 start = 0 |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
167 for ref in sorted(ref_lengths.keys()): |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
168 length = ref_lengths[ref] |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
169 transcript_count = 0 |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
170 if args.debug: |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
171 print("Retrieving transcripts: %s\t%s\tlength: %d" % |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
172 (species, ref, length), file=sys.stderr) |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
173 transcript_count += retrieve_region(species, ref, start, |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
174 length, strand) |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
175 if args.debug or args.verbose: |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
176 print("%s\t%s\tlength: %d\ttrancripts:%d" % |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
177 (species, ref, length, transcript_count), |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
178 file=sys.stderr) |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
179 |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
180 |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
181 if __name__ == "__main__": |
da1b538b87e5
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteogenomics/retrieve_ensembl_bed commit 88cf1e923a8c9e5bc6953ad412d15a7c70f054d1
galaxyp
parents:
diff
changeset
|
182 __main__() |