annotate bamparse.py @ 0:2a1a2bc6ae8b draft

planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/bamparse commit 4e42cba873625fad03423e65dfffbf4afa91598c
author artbio
date Fri, 13 Oct 2017 02:59:36 -0400
parents
children ae9ea0488850
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
2a1a2bc6ae8b planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/bamparse commit 4e42cba873625fad03423e65dfffbf4afa91598c
artbio
parents:
diff changeset
1 #!/usr/bin/env python
2a1a2bc6ae8b planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/bamparse commit 4e42cba873625fad03423e65dfffbf4afa91598c
artbio
parents:
diff changeset
2 import argparse
2a1a2bc6ae8b planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/bamparse commit 4e42cba873625fad03423e65dfffbf4afa91598c
artbio
parents:
diff changeset
3 from collections import defaultdict
2a1a2bc6ae8b planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/bamparse commit 4e42cba873625fad03423e65dfffbf4afa91598c
artbio
parents:
diff changeset
4
2a1a2bc6ae8b planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/bamparse commit 4e42cba873625fad03423e65dfffbf4afa91598c
artbio
parents:
diff changeset
5 import pysam
2a1a2bc6ae8b planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/bamparse commit 4e42cba873625fad03423e65dfffbf4afa91598c
artbio
parents:
diff changeset
6
2a1a2bc6ae8b planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/bamparse commit 4e42cba873625fad03423e65dfffbf4afa91598c
artbio
parents:
diff changeset
7
2a1a2bc6ae8b planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/bamparse commit 4e42cba873625fad03423e65dfffbf4afa91598c
artbio
parents:
diff changeset
8 def Parser():
2a1a2bc6ae8b planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/bamparse commit 4e42cba873625fad03423e65dfffbf4afa91598c
artbio
parents:
diff changeset
9 the_parser = argparse.ArgumentParser()
2a1a2bc6ae8b planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/bamparse commit 4e42cba873625fad03423e65dfffbf4afa91598c
artbio
parents:
diff changeset
10 the_parser.add_argument('--output', nargs='+', action='store', type=str,
2a1a2bc6ae8b planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/bamparse commit 4e42cba873625fad03423e65dfffbf4afa91598c
artbio
parents:
diff changeset
11 help='Count tables')
2a1a2bc6ae8b planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/bamparse commit 4e42cba873625fad03423e65dfffbf4afa91598c
artbio
parents:
diff changeset
12 the_parser.add_argument('--polarity',
2a1a2bc6ae8b planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/bamparse commit 4e42cba873625fad03423e65dfffbf4afa91598c
artbio
parents:
diff changeset
13 choices=["sense", "antisense", "both"],
2a1a2bc6ae8b planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/bamparse commit 4e42cba873625fad03423e65dfffbf4afa91598c
artbio
parents:
diff changeset
14 help="forward, reverse or both forward an\
2a1a2bc6ae8b planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/bamparse commit 4e42cba873625fad03423e65dfffbf4afa91598c
artbio
parents:
diff changeset
15 reverse reads are counted")
2a1a2bc6ae8b planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/bamparse commit 4e42cba873625fad03423e65dfffbf4afa91598c
artbio
parents:
diff changeset
16 the_parser.add_argument('--alignments', nargs='+',
2a1a2bc6ae8b planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/bamparse commit 4e42cba873625fad03423e65dfffbf4afa91598c
artbio
parents:
diff changeset
17 help="bam alignments files")
2a1a2bc6ae8b planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/bamparse commit 4e42cba873625fad03423e65dfffbf4afa91598c
artbio
parents:
diff changeset
18 the_parser.add_argument('--labels', nargs='+', help="Alignments labels")
2a1a2bc6ae8b planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/bamparse commit 4e42cba873625fad03423e65dfffbf4afa91598c
artbio
parents:
diff changeset
19 the_parser.add_argument('--number',
2a1a2bc6ae8b planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/bamparse commit 4e42cba873625fad03423e65dfffbf4afa91598c
artbio
parents:
diff changeset
20 choices=["unique", "multiple"],
2a1a2bc6ae8b planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/bamparse commit 4e42cba873625fad03423e65dfffbf4afa91598c
artbio
parents:
diff changeset
21 help="output is a single table or multiple tables")
2a1a2bc6ae8b planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/bamparse commit 4e42cba873625fad03423e65dfffbf4afa91598c
artbio
parents:
diff changeset
22 args = the_parser.parse_args()
2a1a2bc6ae8b planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/bamparse commit 4e42cba873625fad03423e65dfffbf4afa91598c
artbio
parents:
diff changeset
23 return args
2a1a2bc6ae8b planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/bamparse commit 4e42cba873625fad03423e65dfffbf4afa91598c
artbio
parents:
diff changeset
24
2a1a2bc6ae8b planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/bamparse commit 4e42cba873625fad03423e65dfffbf4afa91598c
artbio
parents:
diff changeset
25
2a1a2bc6ae8b planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/bamparse commit 4e42cba873625fad03423e65dfffbf4afa91598c
artbio
parents:
diff changeset
26 def get_counts(bamfile, polarity="both"):
2a1a2bc6ae8b planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/bamparse commit 4e42cba873625fad03423e65dfffbf4afa91598c
artbio
parents:
diff changeset
27 """
2a1a2bc6ae8b planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/bamparse commit 4e42cba873625fad03423e65dfffbf4afa91598c
artbio
parents:
diff changeset
28 Takes an AlignmentFile object and returns a dictionary of counts for sense,
2a1a2bc6ae8b planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/bamparse commit 4e42cba873625fad03423e65dfffbf4afa91598c
artbio
parents:
diff changeset
29 antisense, or both sense and antisense reads aligning to the bam references
2a1a2bc6ae8b planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/bamparse commit 4e42cba873625fad03423e65dfffbf4afa91598c
artbio
parents:
diff changeset
30 """
2a1a2bc6ae8b planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/bamparse commit 4e42cba873625fad03423e65dfffbf4afa91598c
artbio
parents:
diff changeset
31 def filter_sense_read(read):
2a1a2bc6ae8b planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/bamparse commit 4e42cba873625fad03423e65dfffbf4afa91598c
artbio
parents:
diff changeset
32 if read.is_reverse:
2a1a2bc6ae8b planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/bamparse commit 4e42cba873625fad03423e65dfffbf4afa91598c
artbio
parents:
diff changeset
33 return 0
2a1a2bc6ae8b planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/bamparse commit 4e42cba873625fad03423e65dfffbf4afa91598c
artbio
parents:
diff changeset
34 else:
2a1a2bc6ae8b planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/bamparse commit 4e42cba873625fad03423e65dfffbf4afa91598c
artbio
parents:
diff changeset
35 return 1
2a1a2bc6ae8b planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/bamparse commit 4e42cba873625fad03423e65dfffbf4afa91598c
artbio
parents:
diff changeset
36
2a1a2bc6ae8b planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/bamparse commit 4e42cba873625fad03423e65dfffbf4afa91598c
artbio
parents:
diff changeset
37 def filter_antisense_read(read):
2a1a2bc6ae8b planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/bamparse commit 4e42cba873625fad03423e65dfffbf4afa91598c
artbio
parents:
diff changeset
38 if read.is_reverse:
2a1a2bc6ae8b planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/bamparse commit 4e42cba873625fad03423e65dfffbf4afa91598c
artbio
parents:
diff changeset
39 return 1
2a1a2bc6ae8b planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/bamparse commit 4e42cba873625fad03423e65dfffbf4afa91598c
artbio
parents:
diff changeset
40 else:
2a1a2bc6ae8b planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/bamparse commit 4e42cba873625fad03423e65dfffbf4afa91598c
artbio
parents:
diff changeset
41 return 0
2a1a2bc6ae8b planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/bamparse commit 4e42cba873625fad03423e65dfffbf4afa91598c
artbio
parents:
diff changeset
42
2a1a2bc6ae8b planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/bamparse commit 4e42cba873625fad03423e65dfffbf4afa91598c
artbio
parents:
diff changeset
43 counts = defaultdict(int)
2a1a2bc6ae8b planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/bamparse commit 4e42cba873625fad03423e65dfffbf4afa91598c
artbio
parents:
diff changeset
44 for ref_name in bamfile.references:
2a1a2bc6ae8b planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/bamparse commit 4e42cba873625fad03423e65dfffbf4afa91598c
artbio
parents:
diff changeset
45 counts[ref_name] = 0
2a1a2bc6ae8b planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/bamparse commit 4e42cba873625fad03423e65dfffbf4afa91598c
artbio
parents:
diff changeset
46 if polarity == "both":
2a1a2bc6ae8b planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/bamparse commit 4e42cba873625fad03423e65dfffbf4afa91598c
artbio
parents:
diff changeset
47 for ref_name in bamfile.references:
2a1a2bc6ae8b planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/bamparse commit 4e42cba873625fad03423e65dfffbf4afa91598c
artbio
parents:
diff changeset
48 counts[ref_name] = bamfile.count(reference=ref_name)
2a1a2bc6ae8b planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/bamparse commit 4e42cba873625fad03423e65dfffbf4afa91598c
artbio
parents:
diff changeset
49 if polarity == "sense":
2a1a2bc6ae8b planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/bamparse commit 4e42cba873625fad03423e65dfffbf4afa91598c
artbio
parents:
diff changeset
50 for ref_name in bamfile.references:
2a1a2bc6ae8b planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/bamparse commit 4e42cba873625fad03423e65dfffbf4afa91598c
artbio
parents:
diff changeset
51 for read in bamfile.fetch(ref_name):
2a1a2bc6ae8b planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/bamparse commit 4e42cba873625fad03423e65dfffbf4afa91598c
artbio
parents:
diff changeset
52 counts[ref_name] += filter_sense_read(read)
2a1a2bc6ae8b planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/bamparse commit 4e42cba873625fad03423e65dfffbf4afa91598c
artbio
parents:
diff changeset
53 if polarity == "antisense":
2a1a2bc6ae8b planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/bamparse commit 4e42cba873625fad03423e65dfffbf4afa91598c
artbio
parents:
diff changeset
54 for ref_name in bamfile.references:
2a1a2bc6ae8b planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/bamparse commit 4e42cba873625fad03423e65dfffbf4afa91598c
artbio
parents:
diff changeset
55 for read in bamfile.fetch(ref_name):
2a1a2bc6ae8b planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/bamparse commit 4e42cba873625fad03423e65dfffbf4afa91598c
artbio
parents:
diff changeset
56 counts[ref_name] += filter_antisense_read(read)
2a1a2bc6ae8b planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/bamparse commit 4e42cba873625fad03423e65dfffbf4afa91598c
artbio
parents:
diff changeset
57 return counts
2a1a2bc6ae8b planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/bamparse commit 4e42cba873625fad03423e65dfffbf4afa91598c
artbio
parents:
diff changeset
58
2a1a2bc6ae8b planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/bamparse commit 4e42cba873625fad03423e65dfffbf4afa91598c
artbio
parents:
diff changeset
59
2a1a2bc6ae8b planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/bamparse commit 4e42cba873625fad03423e65dfffbf4afa91598c
artbio
parents:
diff changeset
60 def writetable(diclist, labels, output, number):
2a1a2bc6ae8b planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/bamparse commit 4e42cba873625fad03423e65dfffbf4afa91598c
artbio
parents:
diff changeset
61 ''' diclist is a list of count dictionnaries '''
2a1a2bc6ae8b planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/bamparse commit 4e42cba873625fad03423e65dfffbf4afa91598c
artbio
parents:
diff changeset
62 countlists = []
2a1a2bc6ae8b planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/bamparse commit 4e42cba873625fad03423e65dfffbf4afa91598c
artbio
parents:
diff changeset
63 for dic in diclist:
2a1a2bc6ae8b planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/bamparse commit 4e42cba873625fad03423e65dfffbf4afa91598c
artbio
parents:
diff changeset
64 counts = sorted(dic.items())
2a1a2bc6ae8b planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/bamparse commit 4e42cba873625fad03423e65dfffbf4afa91598c
artbio
parents:
diff changeset
65 counts = [j for (i,j) in counts]
2a1a2bc6ae8b planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/bamparse commit 4e42cba873625fad03423e65dfffbf4afa91598c
artbio
parents:
diff changeset
66 countlists.append(counts)
2a1a2bc6ae8b planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/bamparse commit 4e42cba873625fad03423e65dfffbf4afa91598c
artbio
parents:
diff changeset
67 if number == "unique":
2a1a2bc6ae8b planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/bamparse commit 4e42cba873625fad03423e65dfffbf4afa91598c
artbio
parents:
diff changeset
68 out = open("outputdir/table.tabular", "w")
2a1a2bc6ae8b planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/bamparse commit 4e42cba873625fad03423e65dfffbf4afa91598c
artbio
parents:
diff changeset
69 out.write("gene\t%s\n" % "\t".join(labels))
2a1a2bc6ae8b planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/bamparse commit 4e42cba873625fad03423e65dfffbf4afa91598c
artbio
parents:
diff changeset
70 for countline in zip(sorted(diclist[0]), *countlists):
2a1a2bc6ae8b planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/bamparse commit 4e42cba873625fad03423e65dfffbf4afa91598c
artbio
parents:
diff changeset
71 line = [ str(i) for i in countline]
2a1a2bc6ae8b planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/bamparse commit 4e42cba873625fad03423e65dfffbf4afa91598c
artbio
parents:
diff changeset
72 out.write("%s\n" % "\t".join(line))
2a1a2bc6ae8b planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/bamparse commit 4e42cba873625fad03423e65dfffbf4afa91598c
artbio
parents:
diff changeset
73 out.close()
2a1a2bc6ae8b planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/bamparse commit 4e42cba873625fad03423e65dfffbf4afa91598c
artbio
parents:
diff changeset
74 else:
2a1a2bc6ae8b planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/bamparse commit 4e42cba873625fad03423e65dfffbf4afa91598c
artbio
parents:
diff changeset
75 for i, (dic, label) in enumerate(zip(diclist, labels)):
2a1a2bc6ae8b planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/bamparse commit 4e42cba873625fad03423e65dfffbf4afa91598c
artbio
parents:
diff changeset
76 out = open("outputdir/table" + str(i) + ".tabular", "w")
2a1a2bc6ae8b planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/bamparse commit 4e42cba873625fad03423e65dfffbf4afa91598c
artbio
parents:
diff changeset
77 out.write("gene\t%s\n" % label)
2a1a2bc6ae8b planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/bamparse commit 4e42cba873625fad03423e65dfffbf4afa91598c
artbio
parents:
diff changeset
78 for gene in sorted(dic):
2a1a2bc6ae8b planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/bamparse commit 4e42cba873625fad03423e65dfffbf4afa91598c
artbio
parents:
diff changeset
79 out.write("%s\t%s\n" % (gene, dic[gene]))
2a1a2bc6ae8b planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/bamparse commit 4e42cba873625fad03423e65dfffbf4afa91598c
artbio
parents:
diff changeset
80 out.close()
2a1a2bc6ae8b planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/bamparse commit 4e42cba873625fad03423e65dfffbf4afa91598c
artbio
parents:
diff changeset
81
2a1a2bc6ae8b planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/bamparse commit 4e42cba873625fad03423e65dfffbf4afa91598c
artbio
parents:
diff changeset
82
2a1a2bc6ae8b planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/bamparse commit 4e42cba873625fad03423e65dfffbf4afa91598c
artbio
parents:
diff changeset
83
2a1a2bc6ae8b planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/bamparse commit 4e42cba873625fad03423e65dfffbf4afa91598c
artbio
parents:
diff changeset
84 def main(alignments, labels, polarity, output, number):
2a1a2bc6ae8b planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/bamparse commit 4e42cba873625fad03423e65dfffbf4afa91598c
artbio
parents:
diff changeset
85 diclist = []
2a1a2bc6ae8b planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/bamparse commit 4e42cba873625fad03423e65dfffbf4afa91598c
artbio
parents:
diff changeset
86 for file in alignments:
2a1a2bc6ae8b planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/bamparse commit 4e42cba873625fad03423e65dfffbf4afa91598c
artbio
parents:
diff changeset
87 bam_object = pysam.AlignmentFile(file, 'rb')
2a1a2bc6ae8b planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/bamparse commit 4e42cba873625fad03423e65dfffbf4afa91598c
artbio
parents:
diff changeset
88 diclist.append(get_counts(bam_object, polarity=polarity))
2a1a2bc6ae8b planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/bamparse commit 4e42cba873625fad03423e65dfffbf4afa91598c
artbio
parents:
diff changeset
89 writetable(diclist, labels, output, number)
2a1a2bc6ae8b planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/bamparse commit 4e42cba873625fad03423e65dfffbf4afa91598c
artbio
parents:
diff changeset
90
2a1a2bc6ae8b planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/bamparse commit 4e42cba873625fad03423e65dfffbf4afa91598c
artbio
parents:
diff changeset
91
2a1a2bc6ae8b planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/bamparse commit 4e42cba873625fad03423e65dfffbf4afa91598c
artbio
parents:
diff changeset
92 if __name__ == "__main__":
2a1a2bc6ae8b planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/bamparse commit 4e42cba873625fad03423e65dfffbf4afa91598c
artbio
parents:
diff changeset
93 args = Parser()
2a1a2bc6ae8b planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/bamparse commit 4e42cba873625fad03423e65dfffbf4afa91598c
artbio
parents:
diff changeset
94 main(args.alignments, args.labels, args.polarity, args.output, args.number)