annotate fromgtfTobed12.py @ 0:418e4d0fe0bd draft

planemo upload for repository https://github.com/lldelisle/tools-lldelisle/tree/master/tools/fromgtfTobed12 commit 1aaffda5b95e0389e315179345642c0d005867c1
author lldelisle
date Fri, 04 Nov 2022 15:37:12 +0000
parents
children 6fd4b3b90220
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
418e4d0fe0bd planemo upload for repository https://github.com/lldelisle/tools-lldelisle/tree/master/tools/fromgtfTobed12 commit 1aaffda5b95e0389e315179345642c0d005867c1
lldelisle
parents:
diff changeset
1 import argparse
418e4d0fe0bd planemo upload for repository https://github.com/lldelisle/tools-lldelisle/tree/master/tools/fromgtfTobed12 commit 1aaffda5b95e0389e315179345642c0d005867c1
lldelisle
parents:
diff changeset
2 import sys
418e4d0fe0bd planemo upload for repository https://github.com/lldelisle/tools-lldelisle/tree/master/tools/fromgtfTobed12 commit 1aaffda5b95e0389e315179345642c0d005867c1
lldelisle
parents:
diff changeset
3 import warnings
418e4d0fe0bd planemo upload for repository https://github.com/lldelisle/tools-lldelisle/tree/master/tools/fromgtfTobed12 commit 1aaffda5b95e0389e315179345642c0d005867c1
lldelisle
parents:
diff changeset
4
418e4d0fe0bd planemo upload for repository https://github.com/lldelisle/tools-lldelisle/tree/master/tools/fromgtfTobed12 commit 1aaffda5b95e0389e315179345642c0d005867c1
lldelisle
parents:
diff changeset
5 import gffutils
418e4d0fe0bd planemo upload for repository https://github.com/lldelisle/tools-lldelisle/tree/master/tools/fromgtfTobed12 commit 1aaffda5b95e0389e315179345642c0d005867c1
lldelisle
parents:
diff changeset
6
418e4d0fe0bd planemo upload for repository https://github.com/lldelisle/tools-lldelisle/tree/master/tools/fromgtfTobed12 commit 1aaffda5b95e0389e315179345642c0d005867c1
lldelisle
parents:
diff changeset
7 warnings.filterwarnings("ignore", message="It appears you have a gene feature"
418e4d0fe0bd planemo upload for repository https://github.com/lldelisle/tools-lldelisle/tree/master/tools/fromgtfTobed12 commit 1aaffda5b95e0389e315179345642c0d005867c1
lldelisle
parents:
diff changeset
8 " in your GTF file. You may want to use the "
418e4d0fe0bd planemo upload for repository https://github.com/lldelisle/tools-lldelisle/tree/master/tools/fromgtfTobed12 commit 1aaffda5b95e0389e315179345642c0d005867c1
lldelisle
parents:
diff changeset
9 "`disable_infer_genes` option to speed up database "
418e4d0fe0bd planemo upload for repository https://github.com/lldelisle/tools-lldelisle/tree/master/tools/fromgtfTobed12 commit 1aaffda5b95e0389e315179345642c0d005867c1
lldelisle
parents:
diff changeset
10 "creation")
418e4d0fe0bd planemo upload for repository https://github.com/lldelisle/tools-lldelisle/tree/master/tools/fromgtfTobed12 commit 1aaffda5b95e0389e315179345642c0d005867c1
lldelisle
parents:
diff changeset
11 warnings.filterwarnings("ignore", message="It appears you have a transcript "
418e4d0fe0bd planemo upload for repository https://github.com/lldelisle/tools-lldelisle/tree/master/tools/fromgtfTobed12 commit 1aaffda5b95e0389e315179345642c0d005867c1
lldelisle
parents:
diff changeset
12 "feature in your GTF file. You may want to use the "
418e4d0fe0bd planemo upload for repository https://github.com/lldelisle/tools-lldelisle/tree/master/tools/fromgtfTobed12 commit 1aaffda5b95e0389e315179345642c0d005867c1
lldelisle
parents:
diff changeset
13 "`disable_infer_transcripts` option to speed up "
418e4d0fe0bd planemo upload for repository https://github.com/lldelisle/tools-lldelisle/tree/master/tools/fromgtfTobed12 commit 1aaffda5b95e0389e315179345642c0d005867c1
lldelisle
parents:
diff changeset
14 "database creation")
418e4d0fe0bd planemo upload for repository https://github.com/lldelisle/tools-lldelisle/tree/master/tools/fromgtfTobed12 commit 1aaffda5b95e0389e315179345642c0d005867c1
lldelisle
parents:
diff changeset
15 # In gffutils v0.10 they changed the error message:
418e4d0fe0bd planemo upload for repository https://github.com/lldelisle/tools-lldelisle/tree/master/tools/fromgtfTobed12 commit 1aaffda5b95e0389e315179345642c0d005867c1
lldelisle
parents:
diff changeset
16 warnings.filterwarnings("ignore", message="It appears you have a gene feature"
418e4d0fe0bd planemo upload for repository https://github.com/lldelisle/tools-lldelisle/tree/master/tools/fromgtfTobed12 commit 1aaffda5b95e0389e315179345642c0d005867c1
lldelisle
parents:
diff changeset
17 " in your GTF file. You may want to use the "
418e4d0fe0bd planemo upload for repository https://github.com/lldelisle/tools-lldelisle/tree/master/tools/fromgtfTobed12 commit 1aaffda5b95e0389e315179345642c0d005867c1
lldelisle
parents:
diff changeset
18 "`disable_infer_genes=True` option to speed up "
418e4d0fe0bd planemo upload for repository https://github.com/lldelisle/tools-lldelisle/tree/master/tools/fromgtfTobed12 commit 1aaffda5b95e0389e315179345642c0d005867c1
lldelisle
parents:
diff changeset
19 "database creation")
418e4d0fe0bd planemo upload for repository https://github.com/lldelisle/tools-lldelisle/tree/master/tools/fromgtfTobed12 commit 1aaffda5b95e0389e315179345642c0d005867c1
lldelisle
parents:
diff changeset
20 warnings.filterwarnings("ignore", message="It appears you have a transcript "
418e4d0fe0bd planemo upload for repository https://github.com/lldelisle/tools-lldelisle/tree/master/tools/fromgtfTobed12 commit 1aaffda5b95e0389e315179345642c0d005867c1
lldelisle
parents:
diff changeset
21 "feature in your GTF file. You may want to use the "
418e4d0fe0bd planemo upload for repository https://github.com/lldelisle/tools-lldelisle/tree/master/tools/fromgtfTobed12 commit 1aaffda5b95e0389e315179345642c0d005867c1
lldelisle
parents:
diff changeset
22 "`disable_infer_transcripts=True` option to speed up "
418e4d0fe0bd planemo upload for repository https://github.com/lldelisle/tools-lldelisle/tree/master/tools/fromgtfTobed12 commit 1aaffda5b95e0389e315179345642c0d005867c1
lldelisle
parents:
diff changeset
23 "database creation")
418e4d0fe0bd planemo upload for repository https://github.com/lldelisle/tools-lldelisle/tree/master/tools/fromgtfTobed12 commit 1aaffda5b95e0389e315179345642c0d005867c1
lldelisle
parents:
diff changeset
24
418e4d0fe0bd planemo upload for repository https://github.com/lldelisle/tools-lldelisle/tree/master/tools/fromgtfTobed12 commit 1aaffda5b95e0389e315179345642c0d005867c1
lldelisle
parents:
diff changeset
25
418e4d0fe0bd planemo upload for repository https://github.com/lldelisle/tools-lldelisle/tree/master/tools/fromgtfTobed12 commit 1aaffda5b95e0389e315179345642c0d005867c1
lldelisle
parents:
diff changeset
26 def convert_gtf_to_bed(fn, fo, useGene, mergeTranscripts,
418e4d0fe0bd planemo upload for repository https://github.com/lldelisle/tools-lldelisle/tree/master/tools/fromgtfTobed12 commit 1aaffda5b95e0389e315179345642c0d005867c1
lldelisle
parents:
diff changeset
27 mergeTranscriptsAndOverlappingExons, ucsc):
418e4d0fe0bd planemo upload for repository https://github.com/lldelisle/tools-lldelisle/tree/master/tools/fromgtfTobed12 commit 1aaffda5b95e0389e315179345642c0d005867c1
lldelisle
parents:
diff changeset
28 db = gffutils.create_db(fn, ':memory:')
418e4d0fe0bd planemo upload for repository https://github.com/lldelisle/tools-lldelisle/tree/master/tools/fromgtfTobed12 commit 1aaffda5b95e0389e315179345642c0d005867c1
lldelisle
parents:
diff changeset
29 # For each transcript:
418e4d0fe0bd planemo upload for repository https://github.com/lldelisle/tools-lldelisle/tree/master/tools/fromgtfTobed12 commit 1aaffda5b95e0389e315179345642c0d005867c1
lldelisle
parents:
diff changeset
30 prefered_name = "transcript_name"
418e4d0fe0bd planemo upload for repository https://github.com/lldelisle/tools-lldelisle/tree/master/tools/fromgtfTobed12 commit 1aaffda5b95e0389e315179345642c0d005867c1
lldelisle
parents:
diff changeset
31 if useGene or mergeTranscripts or mergeTranscriptsAndOverlappingExons:
418e4d0fe0bd planemo upload for repository https://github.com/lldelisle/tools-lldelisle/tree/master/tools/fromgtfTobed12 commit 1aaffda5b95e0389e315179345642c0d005867c1
lldelisle
parents:
diff changeset
32 prefered_name = "gene_name"
418e4d0fe0bd planemo upload for repository https://github.com/lldelisle/tools-lldelisle/tree/master/tools/fromgtfTobed12 commit 1aaffda5b95e0389e315179345642c0d005867c1
lldelisle
parents:
diff changeset
33 if mergeTranscripts or mergeTranscriptsAndOverlappingExons:
418e4d0fe0bd planemo upload for repository https://github.com/lldelisle/tools-lldelisle/tree/master/tools/fromgtfTobed12 commit 1aaffda5b95e0389e315179345642c0d005867c1
lldelisle
parents:
diff changeset
34 all_items = db.features_of_type("gene", order_by='start')
418e4d0fe0bd planemo upload for repository https://github.com/lldelisle/tools-lldelisle/tree/master/tools/fromgtfTobed12 commit 1aaffda5b95e0389e315179345642c0d005867c1
lldelisle
parents:
diff changeset
35 else:
418e4d0fe0bd planemo upload for repository https://github.com/lldelisle/tools-lldelisle/tree/master/tools/fromgtfTobed12 commit 1aaffda5b95e0389e315179345642c0d005867c1
lldelisle
parents:
diff changeset
36 all_items = db.features_of_type("transcript", order_by='start')
418e4d0fe0bd planemo upload for repository https://github.com/lldelisle/tools-lldelisle/tree/master/tools/fromgtfTobed12 commit 1aaffda5b95e0389e315179345642c0d005867c1
lldelisle
parents:
diff changeset
37 for tr in all_items:
418e4d0fe0bd planemo upload for repository https://github.com/lldelisle/tools-lldelisle/tree/master/tools/fromgtfTobed12 commit 1aaffda5b95e0389e315179345642c0d005867c1
lldelisle
parents:
diff changeset
38 # The name would be the name of the transcript/gene if exists
418e4d0fe0bd planemo upload for repository https://github.com/lldelisle/tools-lldelisle/tree/master/tools/fromgtfTobed12 commit 1aaffda5b95e0389e315179345642c0d005867c1
lldelisle
parents:
diff changeset
39 try:
418e4d0fe0bd planemo upload for repository https://github.com/lldelisle/tools-lldelisle/tree/master/tools/fromgtfTobed12 commit 1aaffda5b95e0389e315179345642c0d005867c1
lldelisle
parents:
diff changeset
40 # First try to have it directly on the feature
418e4d0fe0bd planemo upload for repository https://github.com/lldelisle/tools-lldelisle/tree/master/tools/fromgtfTobed12 commit 1aaffda5b95e0389e315179345642c0d005867c1
lldelisle
parents:
diff changeset
41 trName = tr.attributes[prefered_name][0]
418e4d0fe0bd planemo upload for repository https://github.com/lldelisle/tools-lldelisle/tree/master/tools/fromgtfTobed12 commit 1aaffda5b95e0389e315179345642c0d005867c1
lldelisle
parents:
diff changeset
42 except KeyError:
418e4d0fe0bd planemo upload for repository https://github.com/lldelisle/tools-lldelisle/tree/master/tools/fromgtfTobed12 commit 1aaffda5b95e0389e315179345642c0d005867c1
lldelisle
parents:
diff changeset
43 # Else try to guess the name of the transcript/gene from exons:
418e4d0fe0bd planemo upload for repository https://github.com/lldelisle/tools-lldelisle/tree/master/tools/fromgtfTobed12 commit 1aaffda5b95e0389e315179345642c0d005867c1
lldelisle
parents:
diff changeset
44 try:
418e4d0fe0bd planemo upload for repository https://github.com/lldelisle/tools-lldelisle/tree/master/tools/fromgtfTobed12 commit 1aaffda5b95e0389e315179345642c0d005867c1
lldelisle
parents:
diff changeset
45 trName = set([e.attributes[prefered_name][0]
418e4d0fe0bd planemo upload for repository https://github.com/lldelisle/tools-lldelisle/tree/master/tools/fromgtfTobed12 commit 1aaffda5b95e0389e315179345642c0d005867c1
lldelisle
parents:
diff changeset
46 for e in
418e4d0fe0bd planemo upload for repository https://github.com/lldelisle/tools-lldelisle/tree/master/tools/fromgtfTobed12 commit 1aaffda5b95e0389e315179345642c0d005867c1
lldelisle
parents:
diff changeset
47 db.children(tr,
418e4d0fe0bd planemo upload for repository https://github.com/lldelisle/tools-lldelisle/tree/master/tools/fromgtfTobed12 commit 1aaffda5b95e0389e315179345642c0d005867c1
lldelisle
parents:
diff changeset
48 featuretype='exon',
418e4d0fe0bd planemo upload for repository https://github.com/lldelisle/tools-lldelisle/tree/master/tools/fromgtfTobed12 commit 1aaffda5b95e0389e315179345642c0d005867c1
lldelisle
parents:
diff changeset
49 order_by='start')]).pop()
418e4d0fe0bd planemo upload for repository https://github.com/lldelisle/tools-lldelisle/tree/master/tools/fromgtfTobed12 commit 1aaffda5b95e0389e315179345642c0d005867c1
lldelisle
parents:
diff changeset
50 except KeyError:
418e4d0fe0bd planemo upload for repository https://github.com/lldelisle/tools-lldelisle/tree/master/tools/fromgtfTobed12 commit 1aaffda5b95e0389e315179345642c0d005867c1
lldelisle
parents:
diff changeset
51 # Else take the transcript id
418e4d0fe0bd planemo upload for repository https://github.com/lldelisle/tools-lldelisle/tree/master/tools/fromgtfTobed12 commit 1aaffda5b95e0389e315179345642c0d005867c1
lldelisle
parents:
diff changeset
52 trName = tr.id
418e4d0fe0bd planemo upload for repository https://github.com/lldelisle/tools-lldelisle/tree/master/tools/fromgtfTobed12 commit 1aaffda5b95e0389e315179345642c0d005867c1
lldelisle
parents:
diff changeset
53 # If the cds is defined in the gtf,
418e4d0fe0bd planemo upload for repository https://github.com/lldelisle/tools-lldelisle/tree/master/tools/fromgtfTobed12 commit 1aaffda5b95e0389e315179345642c0d005867c1
lldelisle
parents:
diff changeset
54 # use it to define the thick start and end
418e4d0fe0bd planemo upload for repository https://github.com/lldelisle/tools-lldelisle/tree/master/tools/fromgtfTobed12 commit 1aaffda5b95e0389e315179345642c0d005867c1
lldelisle
parents:
diff changeset
55 # The gtf is 1-based closed intervalls and
418e4d0fe0bd planemo upload for repository https://github.com/lldelisle/tools-lldelisle/tree/master/tools/fromgtfTobed12 commit 1aaffda5b95e0389e315179345642c0d005867c1
lldelisle
parents:
diff changeset
56 # bed are 0-based half-open so:
418e4d0fe0bd planemo upload for repository https://github.com/lldelisle/tools-lldelisle/tree/master/tools/fromgtfTobed12 commit 1aaffda5b95e0389e315179345642c0d005867c1
lldelisle
parents:
diff changeset
57 # I need to remove one from each start
418e4d0fe0bd planemo upload for repository https://github.com/lldelisle/tools-lldelisle/tree/master/tools/fromgtfTobed12 commit 1aaffda5b95e0389e315179345642c0d005867c1
lldelisle
parents:
diff changeset
58 try:
418e4d0fe0bd planemo upload for repository https://github.com/lldelisle/tools-lldelisle/tree/master/tools/fromgtfTobed12 commit 1aaffda5b95e0389e315179345642c0d005867c1
lldelisle
parents:
diff changeset
59 # In case of multiple CDS (when there is one entry per gene)
418e4d0fe0bd planemo upload for repository https://github.com/lldelisle/tools-lldelisle/tree/master/tools/fromgtfTobed12 commit 1aaffda5b95e0389e315179345642c0d005867c1
lldelisle
parents:
diff changeset
60 # I use the first one to get the start
418e4d0fe0bd planemo upload for repository https://github.com/lldelisle/tools-lldelisle/tree/master/tools/fromgtfTobed12 commit 1aaffda5b95e0389e315179345642c0d005867c1
lldelisle
parents:
diff changeset
61 # and the last one to get the end (order_by=-start)
418e4d0fe0bd planemo upload for repository https://github.com/lldelisle/tools-lldelisle/tree/master/tools/fromgtfTobed12 commit 1aaffda5b95e0389e315179345642c0d005867c1
lldelisle
parents:
diff changeset
62 cds_start = next(db.children(tr,
418e4d0fe0bd planemo upload for repository https://github.com/lldelisle/tools-lldelisle/tree/master/tools/fromgtfTobed12 commit 1aaffda5b95e0389e315179345642c0d005867c1
lldelisle
parents:
diff changeset
63 featuretype='CDS',
418e4d0fe0bd planemo upload for repository https://github.com/lldelisle/tools-lldelisle/tree/master/tools/fromgtfTobed12 commit 1aaffda5b95e0389e315179345642c0d005867c1
lldelisle
parents:
diff changeset
64 order_by='start')).start - 1
418e4d0fe0bd planemo upload for repository https://github.com/lldelisle/tools-lldelisle/tree/master/tools/fromgtfTobed12 commit 1aaffda5b95e0389e315179345642c0d005867c1
lldelisle
parents:
diff changeset
65 cds_end = next(db.children(tr,
418e4d0fe0bd planemo upload for repository https://github.com/lldelisle/tools-lldelisle/tree/master/tools/fromgtfTobed12 commit 1aaffda5b95e0389e315179345642c0d005867c1
lldelisle
parents:
diff changeset
66 featuretype='CDS',
418e4d0fe0bd planemo upload for repository https://github.com/lldelisle/tools-lldelisle/tree/master/tools/fromgtfTobed12 commit 1aaffda5b95e0389e315179345642c0d005867c1
lldelisle
parents:
diff changeset
67 order_by='-start')).end
418e4d0fe0bd planemo upload for repository https://github.com/lldelisle/tools-lldelisle/tree/master/tools/fromgtfTobed12 commit 1aaffda5b95e0389e315179345642c0d005867c1
lldelisle
parents:
diff changeset
68 except StopIteration:
418e4d0fe0bd planemo upload for repository https://github.com/lldelisle/tools-lldelisle/tree/master/tools/fromgtfTobed12 commit 1aaffda5b95e0389e315179345642c0d005867c1
lldelisle
parents:
diff changeset
69 # If the CDS is not defined, then it is set to the start
418e4d0fe0bd planemo upload for repository https://github.com/lldelisle/tools-lldelisle/tree/master/tools/fromgtfTobed12 commit 1aaffda5b95e0389e315179345642c0d005867c1
lldelisle
parents:
diff changeset
70 # as proposed here:
418e4d0fe0bd planemo upload for repository https://github.com/lldelisle/tools-lldelisle/tree/master/tools/fromgtfTobed12 commit 1aaffda5b95e0389e315179345642c0d005867c1
lldelisle
parents:
diff changeset
71 # https://genome.ucsc.edu/FAQ/FAQformat.html#format1
418e4d0fe0bd planemo upload for repository https://github.com/lldelisle/tools-lldelisle/tree/master/tools/fromgtfTobed12 commit 1aaffda5b95e0389e315179345642c0d005867c1
lldelisle
parents:
diff changeset
72 cds_start = tr.start - 1
418e4d0fe0bd planemo upload for repository https://github.com/lldelisle/tools-lldelisle/tree/master/tools/fromgtfTobed12 commit 1aaffda5b95e0389e315179345642c0d005867c1
lldelisle
parents:
diff changeset
73 cds_end = tr.start - 1
418e4d0fe0bd planemo upload for repository https://github.com/lldelisle/tools-lldelisle/tree/master/tools/fromgtfTobed12 commit 1aaffda5b95e0389e315179345642c0d005867c1
lldelisle
parents:
diff changeset
74 # Get all exons starts and lengths
418e4d0fe0bd planemo upload for repository https://github.com/lldelisle/tools-lldelisle/tree/master/tools/fromgtfTobed12 commit 1aaffda5b95e0389e315179345642c0d005867c1
lldelisle
parents:
diff changeset
75 if mergeTranscriptsAndOverlappingExons:
418e4d0fe0bd planemo upload for repository https://github.com/lldelisle/tools-lldelisle/tree/master/tools/fromgtfTobed12 commit 1aaffda5b95e0389e315179345642c0d005867c1
lldelisle
parents:
diff changeset
76 # We merge overlapping exons:
418e4d0fe0bd planemo upload for repository https://github.com/lldelisle/tools-lldelisle/tree/master/tools/fromgtfTobed12 commit 1aaffda5b95e0389e315179345642c0d005867c1
lldelisle
parents:
diff changeset
77 exons_starts = []
418e4d0fe0bd planemo upload for repository https://github.com/lldelisle/tools-lldelisle/tree/master/tools/fromgtfTobed12 commit 1aaffda5b95e0389e315179345642c0d005867c1
lldelisle
parents:
diff changeset
78 exons_length = []
418e4d0fe0bd planemo upload for repository https://github.com/lldelisle/tools-lldelisle/tree/master/tools/fromgtfTobed12 commit 1aaffda5b95e0389e315179345642c0d005867c1
lldelisle
parents:
diff changeset
79 current_start = -1
418e4d0fe0bd planemo upload for repository https://github.com/lldelisle/tools-lldelisle/tree/master/tools/fromgtfTobed12 commit 1aaffda5b95e0389e315179345642c0d005867c1
lldelisle
parents:
diff changeset
80 current_end = None
418e4d0fe0bd planemo upload for repository https://github.com/lldelisle/tools-lldelisle/tree/master/tools/fromgtfTobed12 commit 1aaffda5b95e0389e315179345642c0d005867c1
lldelisle
parents:
diff changeset
81 for e in db.children(tr, featuretype='exon', order_by='start'):
418e4d0fe0bd planemo upload for repository https://github.com/lldelisle/tools-lldelisle/tree/master/tools/fromgtfTobed12 commit 1aaffda5b95e0389e315179345642c0d005867c1
lldelisle
parents:
diff changeset
82 if current_start == -1:
418e4d0fe0bd planemo upload for repository https://github.com/lldelisle/tools-lldelisle/tree/master/tools/fromgtfTobed12 commit 1aaffda5b95e0389e315179345642c0d005867c1
lldelisle
parents:
diff changeset
83 current_start = e.start - 1
418e4d0fe0bd planemo upload for repository https://github.com/lldelisle/tools-lldelisle/tree/master/tools/fromgtfTobed12 commit 1aaffda5b95e0389e315179345642c0d005867c1
lldelisle
parents:
diff changeset
84 current_end = e.end
418e4d0fe0bd planemo upload for repository https://github.com/lldelisle/tools-lldelisle/tree/master/tools/fromgtfTobed12 commit 1aaffda5b95e0389e315179345642c0d005867c1
lldelisle
parents:
diff changeset
85 else:
418e4d0fe0bd planemo upload for repository https://github.com/lldelisle/tools-lldelisle/tree/master/tools/fromgtfTobed12 commit 1aaffda5b95e0389e315179345642c0d005867c1
lldelisle
parents:
diff changeset
86 if e.start > current_end:
418e4d0fe0bd planemo upload for repository https://github.com/lldelisle/tools-lldelisle/tree/master/tools/fromgtfTobed12 commit 1aaffda5b95e0389e315179345642c0d005867c1
lldelisle
parents:
diff changeset
87 # This is a non-overlapping exon
418e4d0fe0bd planemo upload for repository https://github.com/lldelisle/tools-lldelisle/tree/master/tools/fromgtfTobed12 commit 1aaffda5b95e0389e315179345642c0d005867c1
lldelisle
parents:
diff changeset
88 # We store the previous exon:
418e4d0fe0bd planemo upload for repository https://github.com/lldelisle/tools-lldelisle/tree/master/tools/fromgtfTobed12 commit 1aaffda5b95e0389e315179345642c0d005867c1
lldelisle
parents:
diff changeset
89 exons_starts.append(current_start)
418e4d0fe0bd planemo upload for repository https://github.com/lldelisle/tools-lldelisle/tree/master/tools/fromgtfTobed12 commit 1aaffda5b95e0389e315179345642c0d005867c1
lldelisle
parents:
diff changeset
90 exons_length.append(current_end - current_start)
418e4d0fe0bd planemo upload for repository https://github.com/lldelisle/tools-lldelisle/tree/master/tools/fromgtfTobed12 commit 1aaffda5b95e0389e315179345642c0d005867c1
lldelisle
parents:
diff changeset
91 # We set the current:
418e4d0fe0bd planemo upload for repository https://github.com/lldelisle/tools-lldelisle/tree/master/tools/fromgtfTobed12 commit 1aaffda5b95e0389e315179345642c0d005867c1
lldelisle
parents:
diff changeset
92 current_start = e.start - 1
418e4d0fe0bd planemo upload for repository https://github.com/lldelisle/tools-lldelisle/tree/master/tools/fromgtfTobed12 commit 1aaffda5b95e0389e315179345642c0d005867c1
lldelisle
parents:
diff changeset
93 current_end = e.end
418e4d0fe0bd planemo upload for repository https://github.com/lldelisle/tools-lldelisle/tree/master/tools/fromgtfTobed12 commit 1aaffda5b95e0389e315179345642c0d005867c1
lldelisle
parents:
diff changeset
94 else:
418e4d0fe0bd planemo upload for repository https://github.com/lldelisle/tools-lldelisle/tree/master/tools/fromgtfTobed12 commit 1aaffda5b95e0389e315179345642c0d005867c1
lldelisle
parents:
diff changeset
95 # This is an overlapping exon
418e4d0fe0bd planemo upload for repository https://github.com/lldelisle/tools-lldelisle/tree/master/tools/fromgtfTobed12 commit 1aaffda5b95e0389e315179345642c0d005867c1
lldelisle
parents:
diff changeset
96 # We update current_end if necessary
418e4d0fe0bd planemo upload for repository https://github.com/lldelisle/tools-lldelisle/tree/master/tools/fromgtfTobed12 commit 1aaffda5b95e0389e315179345642c0d005867c1
lldelisle
parents:
diff changeset
97 current_end = max(current_end, e.end)
418e4d0fe0bd planemo upload for repository https://github.com/lldelisle/tools-lldelisle/tree/master/tools/fromgtfTobed12 commit 1aaffda5b95e0389e315179345642c0d005867c1
lldelisle
parents:
diff changeset
98 if current_start != -1:
418e4d0fe0bd planemo upload for repository https://github.com/lldelisle/tools-lldelisle/tree/master/tools/fromgtfTobed12 commit 1aaffda5b95e0389e315179345642c0d005867c1
lldelisle
parents:
diff changeset
99 # There is a last exon to store:
418e4d0fe0bd planemo upload for repository https://github.com/lldelisle/tools-lldelisle/tree/master/tools/fromgtfTobed12 commit 1aaffda5b95e0389e315179345642c0d005867c1
lldelisle
parents:
diff changeset
100 exons_starts.append(current_start)
418e4d0fe0bd planemo upload for repository https://github.com/lldelisle/tools-lldelisle/tree/master/tools/fromgtfTobed12 commit 1aaffda5b95e0389e315179345642c0d005867c1
lldelisle
parents:
diff changeset
101 exons_length.append(current_end - current_start)
418e4d0fe0bd planemo upload for repository https://github.com/lldelisle/tools-lldelisle/tree/master/tools/fromgtfTobed12 commit 1aaffda5b95e0389e315179345642c0d005867c1
lldelisle
parents:
diff changeset
102 else:
418e4d0fe0bd planemo upload for repository https://github.com/lldelisle/tools-lldelisle/tree/master/tools/fromgtfTobed12 commit 1aaffda5b95e0389e315179345642c0d005867c1
lldelisle
parents:
diff changeset
103 exons_starts = [e.start - 1
418e4d0fe0bd planemo upload for repository https://github.com/lldelisle/tools-lldelisle/tree/master/tools/fromgtfTobed12 commit 1aaffda5b95e0389e315179345642c0d005867c1
lldelisle
parents:
diff changeset
104 for e in
418e4d0fe0bd planemo upload for repository https://github.com/lldelisle/tools-lldelisle/tree/master/tools/fromgtfTobed12 commit 1aaffda5b95e0389e315179345642c0d005867c1
lldelisle
parents:
diff changeset
105 db.children(tr, featuretype='exon',
418e4d0fe0bd planemo upload for repository https://github.com/lldelisle/tools-lldelisle/tree/master/tools/fromgtfTobed12 commit 1aaffda5b95e0389e315179345642c0d005867c1
lldelisle
parents:
diff changeset
106 order_by='start')]
418e4d0fe0bd planemo upload for repository https://github.com/lldelisle/tools-lldelisle/tree/master/tools/fromgtfTobed12 commit 1aaffda5b95e0389e315179345642c0d005867c1
lldelisle
parents:
diff changeset
107 exons_length = [len(e)
418e4d0fe0bd planemo upload for repository https://github.com/lldelisle/tools-lldelisle/tree/master/tools/fromgtfTobed12 commit 1aaffda5b95e0389e315179345642c0d005867c1
lldelisle
parents:
diff changeset
108 for e in
418e4d0fe0bd planemo upload for repository https://github.com/lldelisle/tools-lldelisle/tree/master/tools/fromgtfTobed12 commit 1aaffda5b95e0389e315179345642c0d005867c1
lldelisle
parents:
diff changeset
109 db.children(tr, featuretype='exon',
418e4d0fe0bd planemo upload for repository https://github.com/lldelisle/tools-lldelisle/tree/master/tools/fromgtfTobed12 commit 1aaffda5b95e0389e315179345642c0d005867c1
lldelisle
parents:
diff changeset
110 order_by='start')]
418e4d0fe0bd planemo upload for repository https://github.com/lldelisle/tools-lldelisle/tree/master/tools/fromgtfTobed12 commit 1aaffda5b95e0389e315179345642c0d005867c1
lldelisle
parents:
diff changeset
111 # Rewrite the chromosome name if needed:
418e4d0fe0bd planemo upload for repository https://github.com/lldelisle/tools-lldelisle/tree/master/tools/fromgtfTobed12 commit 1aaffda5b95e0389e315179345642c0d005867c1
lldelisle
parents:
diff changeset
112 chrom = tr.chrom
418e4d0fe0bd planemo upload for repository https://github.com/lldelisle/tools-lldelisle/tree/master/tools/fromgtfTobed12 commit 1aaffda5b95e0389e315179345642c0d005867c1
lldelisle
parents:
diff changeset
113 if ucsc and chrom[0:3] != 'chr':
418e4d0fe0bd planemo upload for repository https://github.com/lldelisle/tools-lldelisle/tree/master/tools/fromgtfTobed12 commit 1aaffda5b95e0389e315179345642c0d005867c1
lldelisle
parents:
diff changeset
114 chrom = 'chr' + chrom
418e4d0fe0bd planemo upload for repository https://github.com/lldelisle/tools-lldelisle/tree/master/tools/fromgtfTobed12 commit 1aaffda5b95e0389e315179345642c0d005867c1
lldelisle
parents:
diff changeset
115 fo.write("%s\t%d\t%d\t%s\t%d\t%s\t%d\t%d\t%s\t%d\t%s\t%s\n" %
418e4d0fe0bd planemo upload for repository https://github.com/lldelisle/tools-lldelisle/tree/master/tools/fromgtfTobed12 commit 1aaffda5b95e0389e315179345642c0d005867c1
lldelisle
parents:
diff changeset
116 (chrom, tr.start - 1, tr.end, trName, 0, tr.strand,
418e4d0fe0bd planemo upload for repository https://github.com/lldelisle/tools-lldelisle/tree/master/tools/fromgtfTobed12 commit 1aaffda5b95e0389e315179345642c0d005867c1
lldelisle
parents:
diff changeset
117 cds_start, cds_end, "0", len(exons_starts),
418e4d0fe0bd planemo upload for repository https://github.com/lldelisle/tools-lldelisle/tree/master/tools/fromgtfTobed12 commit 1aaffda5b95e0389e315179345642c0d005867c1
lldelisle
parents:
diff changeset
118 ",".join([str(ex_l) for ex_l in exons_length]),
418e4d0fe0bd planemo upload for repository https://github.com/lldelisle/tools-lldelisle/tree/master/tools/fromgtfTobed12 commit 1aaffda5b95e0389e315179345642c0d005867c1
lldelisle
parents:
diff changeset
119 ",".join([str(s - (tr.start - 1)) for s in exons_starts])))
418e4d0fe0bd planemo upload for repository https://github.com/lldelisle/tools-lldelisle/tree/master/tools/fromgtfTobed12 commit 1aaffda5b95e0389e315179345642c0d005867c1
lldelisle
parents:
diff changeset
120
418e4d0fe0bd planemo upload for repository https://github.com/lldelisle/tools-lldelisle/tree/master/tools/fromgtfTobed12 commit 1aaffda5b95e0389e315179345642c0d005867c1
lldelisle
parents:
diff changeset
121
418e4d0fe0bd planemo upload for repository https://github.com/lldelisle/tools-lldelisle/tree/master/tools/fromgtfTobed12 commit 1aaffda5b95e0389e315179345642c0d005867c1
lldelisle
parents:
diff changeset
122 argp = argparse.ArgumentParser(
418e4d0fe0bd planemo upload for repository https://github.com/lldelisle/tools-lldelisle/tree/master/tools/fromgtfTobed12 commit 1aaffda5b95e0389e315179345642c0d005867c1
lldelisle
parents:
diff changeset
123 description=("Convert a gtf to a bed12 with one entry"
418e4d0fe0bd planemo upload for repository https://github.com/lldelisle/tools-lldelisle/tree/master/tools/fromgtfTobed12 commit 1aaffda5b95e0389e315179345642c0d005867c1
lldelisle
parents:
diff changeset
124 " per transcript/gene"))
418e4d0fe0bd planemo upload for repository https://github.com/lldelisle/tools-lldelisle/tree/master/tools/fromgtfTobed12 commit 1aaffda5b95e0389e315179345642c0d005867c1
lldelisle
parents:
diff changeset
125 argp.add_argument('input', default=None,
418e4d0fe0bd planemo upload for repository https://github.com/lldelisle/tools-lldelisle/tree/master/tools/fromgtfTobed12 commit 1aaffda5b95e0389e315179345642c0d005867c1
lldelisle
parents:
diff changeset
126 help="Input gtf file (can be gzip).")
418e4d0fe0bd planemo upload for repository https://github.com/lldelisle/tools-lldelisle/tree/master/tools/fromgtfTobed12 commit 1aaffda5b95e0389e315179345642c0d005867c1
lldelisle
parents:
diff changeset
127 argp.add_argument('--output', default=sys.stdout,
418e4d0fe0bd planemo upload for repository https://github.com/lldelisle/tools-lldelisle/tree/master/tools/fromgtfTobed12 commit 1aaffda5b95e0389e315179345642c0d005867c1
lldelisle
parents:
diff changeset
128 type=argparse.FileType('w'),
418e4d0fe0bd planemo upload for repository https://github.com/lldelisle/tools-lldelisle/tree/master/tools/fromgtfTobed12 commit 1aaffda5b95e0389e315179345642c0d005867c1
lldelisle
parents:
diff changeset
129 help="Output bed12 file.")
418e4d0fe0bd planemo upload for repository https://github.com/lldelisle/tools-lldelisle/tree/master/tools/fromgtfTobed12 commit 1aaffda5b95e0389e315179345642c0d005867c1
lldelisle
parents:
diff changeset
130 argp.add_argument('--useGene', action="store_true",
418e4d0fe0bd planemo upload for repository https://github.com/lldelisle/tools-lldelisle/tree/master/tools/fromgtfTobed12 commit 1aaffda5b95e0389e315179345642c0d005867c1
lldelisle
parents:
diff changeset
131 help="Use the gene name instead of the "
418e4d0fe0bd planemo upload for repository https://github.com/lldelisle/tools-lldelisle/tree/master/tools/fromgtfTobed12 commit 1aaffda5b95e0389e315179345642c0d005867c1
lldelisle
parents:
diff changeset
132 "transcript name.")
418e4d0fe0bd planemo upload for repository https://github.com/lldelisle/tools-lldelisle/tree/master/tools/fromgtfTobed12 commit 1aaffda5b95e0389e315179345642c0d005867c1
lldelisle
parents:
diff changeset
133 argp.add_argument('--ucscformat', action="store_true",
418e4d0fe0bd planemo upload for repository https://github.com/lldelisle/tools-lldelisle/tree/master/tools/fromgtfTobed12 commit 1aaffda5b95e0389e315179345642c0d005867c1
lldelisle
parents:
diff changeset
134 help="If you want that all chromosome names "
418e4d0fe0bd planemo upload for repository https://github.com/lldelisle/tools-lldelisle/tree/master/tools/fromgtfTobed12 commit 1aaffda5b95e0389e315179345642c0d005867c1
lldelisle
parents:
diff changeset
135 "begin with 'chr'.")
418e4d0fe0bd planemo upload for repository https://github.com/lldelisle/tools-lldelisle/tree/master/tools/fromgtfTobed12 commit 1aaffda5b95e0389e315179345642c0d005867c1
lldelisle
parents:
diff changeset
136 group = argp.add_mutually_exclusive_group()
418e4d0fe0bd planemo upload for repository https://github.com/lldelisle/tools-lldelisle/tree/master/tools/fromgtfTobed12 commit 1aaffda5b95e0389e315179345642c0d005867c1
lldelisle
parents:
diff changeset
137 group.add_argument('--mergeTranscripts', action="store_true",
418e4d0fe0bd planemo upload for repository https://github.com/lldelisle/tools-lldelisle/tree/master/tools/fromgtfTobed12 commit 1aaffda5b95e0389e315179345642c0d005867c1
lldelisle
parents:
diff changeset
138 help="Merge all transcripts into a single "
418e4d0fe0bd planemo upload for repository https://github.com/lldelisle/tools-lldelisle/tree/master/tools/fromgtfTobed12 commit 1aaffda5b95e0389e315179345642c0d005867c1
lldelisle
parents:
diff changeset
139 "entry to have one line per gene.")
418e4d0fe0bd planemo upload for repository https://github.com/lldelisle/tools-lldelisle/tree/master/tools/fromgtfTobed12 commit 1aaffda5b95e0389e315179345642c0d005867c1
lldelisle
parents:
diff changeset
140 group.add_argument('--mergeTranscriptsAndOverlappingExons',
418e4d0fe0bd planemo upload for repository https://github.com/lldelisle/tools-lldelisle/tree/master/tools/fromgtfTobed12 commit 1aaffda5b95e0389e315179345642c0d005867c1
lldelisle
parents:
diff changeset
141 action="store_true",
418e4d0fe0bd planemo upload for repository https://github.com/lldelisle/tools-lldelisle/tree/master/tools/fromgtfTobed12 commit 1aaffda5b95e0389e315179345642c0d005867c1
lldelisle
parents:
diff changeset
142 help="Merge all transcripts into a single "
418e4d0fe0bd planemo upload for repository https://github.com/lldelisle/tools-lldelisle/tree/master/tools/fromgtfTobed12 commit 1aaffda5b95e0389e315179345642c0d005867c1
lldelisle
parents:
diff changeset
143 "entry to have one line per gene and merge"
418e4d0fe0bd planemo upload for repository https://github.com/lldelisle/tools-lldelisle/tree/master/tools/fromgtfTobed12 commit 1aaffda5b95e0389e315179345642c0d005867c1
lldelisle
parents:
diff changeset
144 " overlapping exons.")
418e4d0fe0bd planemo upload for repository https://github.com/lldelisle/tools-lldelisle/tree/master/tools/fromgtfTobed12 commit 1aaffda5b95e0389e315179345642c0d005867c1
lldelisle
parents:
diff changeset
145
418e4d0fe0bd planemo upload for repository https://github.com/lldelisle/tools-lldelisle/tree/master/tools/fromgtfTobed12 commit 1aaffda5b95e0389e315179345642c0d005867c1
lldelisle
parents:
diff changeset
146 args = argp.parse_args()
418e4d0fe0bd planemo upload for repository https://github.com/lldelisle/tools-lldelisle/tree/master/tools/fromgtfTobed12 commit 1aaffda5b95e0389e315179345642c0d005867c1
lldelisle
parents:
diff changeset
147 convert_gtf_to_bed(args.input, args.output, args.useGene,
418e4d0fe0bd planemo upload for repository https://github.com/lldelisle/tools-lldelisle/tree/master/tools/fromgtfTobed12 commit 1aaffda5b95e0389e315179345642c0d005867c1
lldelisle
parents:
diff changeset
148 args.mergeTranscripts,
418e4d0fe0bd planemo upload for repository https://github.com/lldelisle/tools-lldelisle/tree/master/tools/fromgtfTobed12 commit 1aaffda5b95e0389e315179345642c0d005867c1
lldelisle
parents:
diff changeset
149 args.mergeTranscriptsAndOverlappingExons,
418e4d0fe0bd planemo upload for repository https://github.com/lldelisle/tools-lldelisle/tree/master/tools/fromgtfTobed12 commit 1aaffda5b95e0389e315179345642c0d005867c1
lldelisle
parents:
diff changeset
150 args.ucscformat)