annotate GFFtools-GX/bed_to_gff.py @ 3:ff2c2e6f4ab3

Uploaded version 2.0.0 of gfftools ready to import to local instance
author vipints
date Wed, 11 Jun 2014 16:29:25 -0400
parents
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
3
ff2c2e6f4ab3 Uploaded version 2.0.0 of gfftools ready to import to local instance
vipints
parents:
diff changeset
1 #!/usr/bin/env python
ff2c2e6f4ab3 Uploaded version 2.0.0 of gfftools ready to import to local instance
vipints
parents:
diff changeset
2 """
ff2c2e6f4ab3 Uploaded version 2.0.0 of gfftools ready to import to local instance
vipints
parents:
diff changeset
3 Convert genome annotation data in a 12 column BED format to GFF3.
ff2c2e6f4ab3 Uploaded version 2.0.0 of gfftools ready to import to local instance
vipints
parents:
diff changeset
4
ff2c2e6f4ab3 Uploaded version 2.0.0 of gfftools ready to import to local instance
vipints
parents:
diff changeset
5 Usage: python bed_to_gff.py in.bed > out.gff
ff2c2e6f4ab3 Uploaded version 2.0.0 of gfftools ready to import to local instance
vipints
parents:
diff changeset
6
ff2c2e6f4ab3 Uploaded version 2.0.0 of gfftools ready to import to local instance
vipints
parents:
diff changeset
7 Requirement:
ff2c2e6f4ab3 Uploaded version 2.0.0 of gfftools ready to import to local instance
vipints
parents:
diff changeset
8 helper.py : https://github.com/vipints/GFFtools-GX/blob/master/helper.py
ff2c2e6f4ab3 Uploaded version 2.0.0 of gfftools ready to import to local instance
vipints
parents:
diff changeset
9
ff2c2e6f4ab3 Uploaded version 2.0.0 of gfftools ready to import to local instance
vipints
parents:
diff changeset
10 Copyright (C)
ff2c2e6f4ab3 Uploaded version 2.0.0 of gfftools ready to import to local instance
vipints
parents:
diff changeset
11 2009-2012 Friedrich Miescher Laboratory of the Max Planck Society, Tubingen, Germany.
ff2c2e6f4ab3 Uploaded version 2.0.0 of gfftools ready to import to local instance
vipints
parents:
diff changeset
12 2012-2014 Memorial Sloan Kettering Cancer Center New York City, USA.
ff2c2e6f4ab3 Uploaded version 2.0.0 of gfftools ready to import to local instance
vipints
parents:
diff changeset
13 """
ff2c2e6f4ab3 Uploaded version 2.0.0 of gfftools ready to import to local instance
vipints
parents:
diff changeset
14
ff2c2e6f4ab3 Uploaded version 2.0.0 of gfftools ready to import to local instance
vipints
parents:
diff changeset
15 import re
ff2c2e6f4ab3 Uploaded version 2.0.0 of gfftools ready to import to local instance
vipints
parents:
diff changeset
16 import sys
ff2c2e6f4ab3 Uploaded version 2.0.0 of gfftools ready to import to local instance
vipints
parents:
diff changeset
17 import helper
ff2c2e6f4ab3 Uploaded version 2.0.0 of gfftools ready to import to local instance
vipints
parents:
diff changeset
18
ff2c2e6f4ab3 Uploaded version 2.0.0 of gfftools ready to import to local instance
vipints
parents:
diff changeset
19 def __main__():
ff2c2e6f4ab3 Uploaded version 2.0.0 of gfftools ready to import to local instance
vipints
parents:
diff changeset
20 """
ff2c2e6f4ab3 Uploaded version 2.0.0 of gfftools ready to import to local instance
vipints
parents:
diff changeset
21 main function
ff2c2e6f4ab3 Uploaded version 2.0.0 of gfftools ready to import to local instance
vipints
parents:
diff changeset
22 """
ff2c2e6f4ab3 Uploaded version 2.0.0 of gfftools ready to import to local instance
vipints
parents:
diff changeset
23
ff2c2e6f4ab3 Uploaded version 2.0.0 of gfftools ready to import to local instance
vipints
parents:
diff changeset
24 try:
ff2c2e6f4ab3 Uploaded version 2.0.0 of gfftools ready to import to local instance
vipints
parents:
diff changeset
25 bed_fname = sys.argv[1]
ff2c2e6f4ab3 Uploaded version 2.0.0 of gfftools ready to import to local instance
vipints
parents:
diff changeset
26 except:
ff2c2e6f4ab3 Uploaded version 2.0.0 of gfftools ready to import to local instance
vipints
parents:
diff changeset
27 print __doc__
ff2c2e6f4ab3 Uploaded version 2.0.0 of gfftools ready to import to local instance
vipints
parents:
diff changeset
28 sys.exit(-1)
ff2c2e6f4ab3 Uploaded version 2.0.0 of gfftools ready to import to local instance
vipints
parents:
diff changeset
29
ff2c2e6f4ab3 Uploaded version 2.0.0 of gfftools ready to import to local instance
vipints
parents:
diff changeset
30 bed_fh = helper.open_file(bed_fname)
ff2c2e6f4ab3 Uploaded version 2.0.0 of gfftools ready to import to local instance
vipints
parents:
diff changeset
31
ff2c2e6f4ab3 Uploaded version 2.0.0 of gfftools ready to import to local instance
vipints
parents:
diff changeset
32 for line in bed_fh:
ff2c2e6f4ab3 Uploaded version 2.0.0 of gfftools ready to import to local instance
vipints
parents:
diff changeset
33 line = line.strip( '\n\r' )
ff2c2e6f4ab3 Uploaded version 2.0.0 of gfftools ready to import to local instance
vipints
parents:
diff changeset
34
ff2c2e6f4ab3 Uploaded version 2.0.0 of gfftools ready to import to local instance
vipints
parents:
diff changeset
35 if not line or line[0] in ['#']:
ff2c2e6f4ab3 Uploaded version 2.0.0 of gfftools ready to import to local instance
vipints
parents:
diff changeset
36 continue
ff2c2e6f4ab3 Uploaded version 2.0.0 of gfftools ready to import to local instance
vipints
parents:
diff changeset
37
ff2c2e6f4ab3 Uploaded version 2.0.0 of gfftools ready to import to local instance
vipints
parents:
diff changeset
38 parts = line.split('\t')
ff2c2e6f4ab3 Uploaded version 2.0.0 of gfftools ready to import to local instance
vipints
parents:
diff changeset
39 assert len(parts) >= 12, line
ff2c2e6f4ab3 Uploaded version 2.0.0 of gfftools ready to import to local instance
vipints
parents:
diff changeset
40
ff2c2e6f4ab3 Uploaded version 2.0.0 of gfftools ready to import to local instance
vipints
parents:
diff changeset
41 rstarts = parts[-1].split(',')
ff2c2e6f4ab3 Uploaded version 2.0.0 of gfftools ready to import to local instance
vipints
parents:
diff changeset
42 rstarts.pop() if rstarts[-1] == '' else rstarts
ff2c2e6f4ab3 Uploaded version 2.0.0 of gfftools ready to import to local instance
vipints
parents:
diff changeset
43
ff2c2e6f4ab3 Uploaded version 2.0.0 of gfftools ready to import to local instance
vipints
parents:
diff changeset
44 exon_lens = parts[-2].split(',')
ff2c2e6f4ab3 Uploaded version 2.0.0 of gfftools ready to import to local instance
vipints
parents:
diff changeset
45 exon_lens.pop() if exon_lens[-1] == '' else exon_lens
ff2c2e6f4ab3 Uploaded version 2.0.0 of gfftools ready to import to local instance
vipints
parents:
diff changeset
46
ff2c2e6f4ab3 Uploaded version 2.0.0 of gfftools ready to import to local instance
vipints
parents:
diff changeset
47 if len(rstarts) != len(exon_lens):
ff2c2e6f4ab3 Uploaded version 2.0.0 of gfftools ready to import to local instance
vipints
parents:
diff changeset
48 continue # checking the consistency col 11 and col 12
ff2c2e6f4ab3 Uploaded version 2.0.0 of gfftools ready to import to local instance
vipints
parents:
diff changeset
49
ff2c2e6f4ab3 Uploaded version 2.0.0 of gfftools ready to import to local instance
vipints
parents:
diff changeset
50 if len(rstarts) != int(parts[-3]):
ff2c2e6f4ab3 Uploaded version 2.0.0 of gfftools ready to import to local instance
vipints
parents:
diff changeset
51 continue # checking the number of exons and block count are same
ff2c2e6f4ab3 Uploaded version 2.0.0 of gfftools ready to import to local instance
vipints
parents:
diff changeset
52
ff2c2e6f4ab3 Uploaded version 2.0.0 of gfftools ready to import to local instance
vipints
parents:
diff changeset
53 if not parts[5] in ['+', '-']:
ff2c2e6f4ab3 Uploaded version 2.0.0 of gfftools ready to import to local instance
vipints
parents:
diff changeset
54 parts[5] = '.' # replace the unknown strand with '.'
ff2c2e6f4ab3 Uploaded version 2.0.0 of gfftools ready to import to local instance
vipints
parents:
diff changeset
55
ff2c2e6f4ab3 Uploaded version 2.0.0 of gfftools ready to import to local instance
vipints
parents:
diff changeset
56 # bed2gff result line
ff2c2e6f4ab3 Uploaded version 2.0.0 of gfftools ready to import to local instance
vipints
parents:
diff changeset
57 print '%s\tbed2gff\tgene\t%d\t%s\t%s\t%s\t.\tID=Gene:%s;Name=Gene:%s' % (parts[0], int(parts[1])+1, parts[2], parts[4], parts[5], parts[3], parts[3])
ff2c2e6f4ab3 Uploaded version 2.0.0 of gfftools ready to import to local instance
vipints
parents:
diff changeset
58 print '%s\tbed2gff\ttranscript\t%d\t%s\t%s\t%s\t.\tID=%s;Name=%s;Parent=Gene:%s' % (parts[0], int(parts[1])+1, parts[2], parts[4], parts[5], parts[3], parts[3], parts[3])
ff2c2e6f4ab3 Uploaded version 2.0.0 of gfftools ready to import to local instance
vipints
parents:
diff changeset
59
ff2c2e6f4ab3 Uploaded version 2.0.0 of gfftools ready to import to local instance
vipints
parents:
diff changeset
60 st = int(parts[1])
ff2c2e6f4ab3 Uploaded version 2.0.0 of gfftools ready to import to local instance
vipints
parents:
diff changeset
61 for ex_cnt in range(int(parts[-3])):
ff2c2e6f4ab3 Uploaded version 2.0.0 of gfftools ready to import to local instance
vipints
parents:
diff changeset
62 start = st + int(rstarts[ex_cnt]) + 1
ff2c2e6f4ab3 Uploaded version 2.0.0 of gfftools ready to import to local instance
vipints
parents:
diff changeset
63 stop = start + int(exon_lens[ex_cnt]) - 1
ff2c2e6f4ab3 Uploaded version 2.0.0 of gfftools ready to import to local instance
vipints
parents:
diff changeset
64 print '%s\tbed2gff\texon\t%d\t%d\t%s\t%s\t.\tParent=%s' % (parts[0], start, stop, parts[4], parts[5], parts[3])
ff2c2e6f4ab3 Uploaded version 2.0.0 of gfftools ready to import to local instance
vipints
parents:
diff changeset
65
ff2c2e6f4ab3 Uploaded version 2.0.0 of gfftools ready to import to local instance
vipints
parents:
diff changeset
66 bed_fh.close()
ff2c2e6f4ab3 Uploaded version 2.0.0 of gfftools ready to import to local instance
vipints
parents:
diff changeset
67
ff2c2e6f4ab3 Uploaded version 2.0.0 of gfftools ready to import to local instance
vipints
parents:
diff changeset
68
ff2c2e6f4ab3 Uploaded version 2.0.0 of gfftools ready to import to local instance
vipints
parents:
diff changeset
69 if __name__ == "__main__":
ff2c2e6f4ab3 Uploaded version 2.0.0 of gfftools ready to import to local instance
vipints
parents:
diff changeset
70 __main__()