Mercurial > repos > vipints > fml_gff3togtf
annotate gbk_to_gff.py @ 11:5c6f33e20fcc default tip
requirement tag added
author | vipints <vipin@cbio.mskcc.org> |
---|---|
date | Fri, 24 Apr 2015 18:04:27 -0400 |
parents | c42c69aa81f8 |
children |
rev | line source |
---|---|
10
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
1 #!/usr/bin/env python |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
2 """ |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
3 Convert data from Genbank format to GFF. |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
4 |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
5 Usage: |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
6 python gbk_to_gff.py in.gbk > out.gff |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
7 |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
8 Requirements: |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
9 BioPython:- http://biopython.org/ |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
10 helper.py:- https://github.com/vipints/GFFtools-GX/blob/master/helper.py |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
11 |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
12 Copyright (C) |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
13 2009-2012 Friedrich Miescher Laboratory of the Max Planck Society, Tubingen, Germany. |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
14 2012-2015 Memorial Sloan Kettering Cancer Center New York City, USA. |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
15 """ |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
16 |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
17 import os |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
18 import re |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
19 import sys |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
20 import helper |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
21 import collections |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
22 from Bio import SeqIO |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
23 |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
24 def feature_table(chr_id, source, orient, genes, transcripts, cds, exons, unk): |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
25 """ |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
26 Write the feature information |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
27 """ |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
28 for gname, ginfo in genes.items(): |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
29 line = [str(chr_id), |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
30 'gbk2gff', |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
31 ginfo[3], |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
32 str(ginfo[0]), |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
33 str(ginfo[1]), |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
34 '.', |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
35 ginfo[2], |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
36 '.', |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
37 'ID=%s;Name=%s' % (str(gname), str(gname))] |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
38 sys.stdout.write('\t'.join(line)+"\n") |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
39 ## construct the transcript line is not defined in the original file |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
40 t_line = [str(chr_id), 'gbk2gff', source, 0, 1, '.', ginfo[2], '.'] |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
41 |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
42 if not transcripts: |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
43 t_line.append('ID=Transcript:%s;Parent=%s' % (str(gname), str(gname))) |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
44 |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
45 if exons: ## get the entire transcript region from the defined feature |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
46 t_line[3] = str(exons[gname][0][0]) |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
47 t_line[4] = str(exons[gname][0][-1]) |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
48 elif cds: |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
49 t_line[3] = str(cds[gname][0][0]) |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
50 t_line[4] = str(cds[gname][0][-1]) |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
51 |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
52 if not cds: |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
53 t_line[2] = 'transcript' |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
54 else: |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
55 t_line[2] = 'mRNA' |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
56 sys.stdout.write('\t'.join(t_line)+"\n") |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
57 |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
58 if exons: |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
59 exon_line_print(t_line, exons[gname], 'Transcript:'+str(gname), 'exon') |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
60 |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
61 if cds: |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
62 exon_line_print(t_line, cds[gname], 'Transcript:'+str(gname), 'CDS') |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
63 if not exons: |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
64 exon_line_print(t_line, cds[gname], 'Transcript:'+str(gname), 'exon') |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
65 |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
66 else: ## transcript is defined |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
67 for idx in transcripts[gname]: |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
68 t_line[2] = idx[3] |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
69 t_line[3] = str(idx[0]) |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
70 t_line[4] = str(idx[1]) |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
71 t_line.append('ID='+str(idx[2])+';Parent='+str(gname)) |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
72 sys.stdout.write('\t'.join(t_line)+"\n") |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
73 |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
74 ## feature line print call |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
75 if exons: |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
76 exon_line_print(t_line, exons[gname], str(idx[2]), 'exon') |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
77 if cds: |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
78 exon_line_print(t_line, cds[gname], str(idx[2]), 'CDS') |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
79 if not exons: |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
80 exon_line_print(t_line, cds[gname], str(idx[2]), 'exon') |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
81 |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
82 if len(genes) == 0: ## feature entry with fragment information |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
83 |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
84 line = [str(chr_id), 'gbk2gff', source, 0, 1, '.', orient, '.'] |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
85 fStart = fStop = None |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
86 |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
87 for eid, ex in cds.items(): |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
88 fStart = ex[0][0] |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
89 fStop = ex[0][-1] |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
90 |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
91 for eid, ex in exons.items(): |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
92 fStart = ex[0][0] |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
93 fStop = ex[0][-1] |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
94 |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
95 if fStart or fStart: |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
96 |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
97 line[2] = 'gene' |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
98 line[3] = str(fStart) |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
99 line[4] = str(fStop) |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
100 line.append('ID=Unknown_Gene_' + str(unk) + ';Name=Unknown_Gene_' + str(unk)) |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
101 sys.stdout.write('\t'.join(line)+"\n") |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
102 |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
103 if not cds: |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
104 line[2] = 'transcript' |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
105 else: |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
106 line[2] = 'mRNA' |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
107 |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
108 line[8] = 'ID=Unknown_Transcript_' + str(unk) + ';Parent=Unknown_Gene_' + str(unk) |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
109 sys.stdout.write('\t'.join(line)+"\n") |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
110 |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
111 if exons: |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
112 exon_line_print(line, cds[None], 'Unknown_Transcript_' + str(unk), 'exon') |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
113 |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
114 if cds: |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
115 exon_line_print(line, cds[None], 'Unknown_Transcript_' + str(unk), 'CDS') |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
116 if not exons: |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
117 exon_line_print(line, cds[None], 'Unknown_Transcript_' + str(unk), 'exon') |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
118 |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
119 unk +=1 |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
120 |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
121 return unk |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
122 |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
123 |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
124 def exon_line_print(temp_line, trx_exons, parent, ftype): |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
125 """ |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
126 Print the EXON feature line |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
127 """ |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
128 for ex in trx_exons: |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
129 temp_line[2] = ftype |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
130 temp_line[3] = str(ex[0]) |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
131 temp_line[4] = str(ex[1]) |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
132 temp_line[8] = 'Parent=%s' % parent |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
133 sys.stdout.write('\t'.join(temp_line)+"\n") |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
134 |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
135 |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
136 def gbk_parse(fname): |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
137 """ |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
138 Extract genome annotation recods from genbank format |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
139 |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
140 @args fname: gbk file name |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
141 @type fname: str |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
142 """ |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
143 fhand = helper.open_file(gbkfname) |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
144 unk = 1 |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
145 |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
146 for record in SeqIO.parse(fhand, "genbank"): |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
147 gene_tags = dict() |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
148 tx_tags = collections.defaultdict(list) |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
149 exon = collections.defaultdict(list) |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
150 cds = collections.defaultdict(list) |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
151 mol_type, chr_id = None, None |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
152 |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
153 for rec in record.features: |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
154 |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
155 if rec.type == 'source': |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
156 try: |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
157 mol_type = rec.qualifiers['mol_type'][0] |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
158 except: |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
159 mol_type = '.' |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
160 pass |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
161 try: |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
162 chr_id = rec.qualifiers['chromosome'][0] |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
163 except: |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
164 chr_id = record.name |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
165 continue |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
166 |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
167 strand='-' |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
168 strand='+' if rec.strand>0 else strand |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
169 |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
170 fid = None |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
171 try: |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
172 fid = rec.qualifiers['gene'][0] |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
173 except: |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
174 pass |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
175 |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
176 transcript_id = None |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
177 try: |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
178 transcript_id = rec.qualifiers['transcript_id'][0] |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
179 except: |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
180 pass |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
181 |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
182 if re.search(r'gene', rec.type): |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
183 gene_tags[fid] = (rec.location._start.position+1, |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
184 rec.location._end.position, |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
185 strand, |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
186 rec.type |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
187 ) |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
188 elif rec.type == 'exon': |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
189 exon[fid].append((rec.location._start.position+1, |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
190 rec.location._end.position)) |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
191 elif rec.type=='CDS': |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
192 cds[fid].append((rec.location._start.position+1, |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
193 rec.location._end.position)) |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
194 else: |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
195 # get all transcripts |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
196 if transcript_id: |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
197 tx_tags[fid].append((rec.location._start.position+1, |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
198 rec.location._end.position, |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
199 transcript_id, |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
200 rec.type)) |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
201 # record extracted, generate feature table |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
202 unk = feature_table(chr_id, mol_type, strand, gene_tags, tx_tags, cds, exon, unk) |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
203 |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
204 fhand.close() |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
205 |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
206 |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
207 if __name__=='__main__': |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
208 |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
209 try: |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
210 gbkfname = sys.argv[1] |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
211 except: |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
212 print __doc__ |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
213 sys.exit(-1) |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
214 |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
215 ## extract gbk records |
c42c69aa81f8
fixed manually the upload of version 2.1.0 - deleted accidentally added files to the repo
vipints <vipin@cbio.mskcc.org>
parents:
diff
changeset
|
216 gbk_parse(gbkfname) |