annotate tools/coverage_stats/coverage_stats.py @ 2:7254ece0c0ff draft

v0.0.5 - Supports max coverage depth in recent samtools. Expects samtools 1.4.1 via Conda, not via Tool Shed.
author peterjc
date Thu, 11 May 2017 12:16:10 -0400
parents ca8f63f2f7d4
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
1 #!/usr/bin/env python
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
2 """BAM coverage statistics using samtools idxstats and depth.
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
3
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
4 This script takes exactly three command line arguments:
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
5 * Input BAM filename
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
6 * Input BAI filename (via Galaxy metadata)
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
7 * Output tabular filename
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
8
2
7254ece0c0ff v0.0.5 - Supports max coverage depth in recent samtools. Expects samtools 1.4.1 via Conda, not via Tool Shed.
peterjc
parents: 0
diff changeset
9 Optional fourth argument:
7254ece0c0ff v0.0.5 - Supports max coverage depth in recent samtools. Expects samtools 1.4.1 via Conda, not via Tool Shed.
peterjc
parents: 0
diff changeset
10 * Max coverage depth (integer)
7254ece0c0ff v0.0.5 - Supports max coverage depth in recent samtools. Expects samtools 1.4.1 via Conda, not via Tool Shed.
peterjc
parents: 0
diff changeset
11
0
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
12 This messes about with the filenames to make samtools happy, then
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
13 runs "samtools idxstats" and "samtools depth", captures and combines
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
14 the output to the desired output tabular file.
2
7254ece0c0ff v0.0.5 - Supports max coverage depth in recent samtools. Expects samtools 1.4.1 via Conda, not via Tool Shed.
peterjc
parents: 0
diff changeset
15
7254ece0c0ff v0.0.5 - Supports max coverage depth in recent samtools. Expects samtools 1.4.1 via Conda, not via Tool Shed.
peterjc
parents: 0
diff changeset
16 Because "samtools depth" treats the max depth a little fuzzily, this
7254ece0c0ff v0.0.5 - Supports max coverage depth in recent samtools. Expects samtools 1.4.1 via Conda, not via Tool Shed.
peterjc
parents: 0
diff changeset
17 tool tries to account for this and applies a clear max-depth cut off.
0
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
18 """
2
7254ece0c0ff v0.0.5 - Supports max coverage depth in recent samtools. Expects samtools 1.4.1 via Conda, not via Tool Shed.
peterjc
parents: 0
diff changeset
19
0
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
20 import os
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
21 import subprocess
2
7254ece0c0ff v0.0.5 - Supports max coverage depth in recent samtools. Expects samtools 1.4.1 via Conda, not via Tool Shed.
peterjc
parents: 0
diff changeset
22 import sys
0
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
23 import tempfile
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
24
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
25 if "-v" in sys.argv or "--version" in sys.argv:
2
7254ece0c0ff v0.0.5 - Supports max coverage depth in recent samtools. Expects samtools 1.4.1 via Conda, not via Tool Shed.
peterjc
parents: 0
diff changeset
26 # Galaxy seems to invert the order of the two lines
7254ece0c0ff v0.0.5 - Supports max coverage depth in recent samtools. Expects samtools 1.4.1 via Conda, not via Tool Shed.
peterjc
parents: 0
diff changeset
27 print("BAM coverage statistics v0.0.5")
0
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
28 cmd = "samtools 2>&1 | grep -i ^Version"
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
29 sys.exit(os.system(cmd))
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
30
2
7254ece0c0ff v0.0.5 - Supports max coverage depth in recent samtools. Expects samtools 1.4.1 via Conda, not via Tool Shed.
peterjc
parents: 0
diff changeset
31 # TODO - Proper command line API
7254ece0c0ff v0.0.5 - Supports max coverage depth in recent samtools. Expects samtools 1.4.1 via Conda, not via Tool Shed.
peterjc
parents: 0
diff changeset
32 if len(sys.argv) == 4:
7254ece0c0ff v0.0.5 - Supports max coverage depth in recent samtools. Expects samtools 1.4.1 via Conda, not via Tool Shed.
peterjc
parents: 0
diff changeset
33 bam_filename, bai_filename, tabular_filename = sys.argv[1:]
7254ece0c0ff v0.0.5 - Supports max coverage depth in recent samtools. Expects samtools 1.4.1 via Conda, not via Tool Shed.
peterjc
parents: 0
diff changeset
34 max_depth = "8000"
7254ece0c0ff v0.0.5 - Supports max coverage depth in recent samtools. Expects samtools 1.4.1 via Conda, not via Tool Shed.
peterjc
parents: 0
diff changeset
35 elif len(sys.argv) == 5:
7254ece0c0ff v0.0.5 - Supports max coverage depth in recent samtools. Expects samtools 1.4.1 via Conda, not via Tool Shed.
peterjc
parents: 0
diff changeset
36 bam_filename, bai_filename, tabular_filename, max_depth = sys.argv[1:]
7254ece0c0ff v0.0.5 - Supports max coverage depth in recent samtools. Expects samtools 1.4.1 via Conda, not via Tool Shed.
peterjc
parents: 0
diff changeset
37 else:
7254ece0c0ff v0.0.5 - Supports max coverage depth in recent samtools. Expects samtools 1.4.1 via Conda, not via Tool Shed.
peterjc
parents: 0
diff changeset
38 sys.exit("Require 3 or 4 arguments: BAM, BAI, tabular filename, [max depth]")
0
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
39
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
40 if not os.path.isfile(bam_filename):
2
7254ece0c0ff v0.0.5 - Supports max coverage depth in recent samtools. Expects samtools 1.4.1 via Conda, not via Tool Shed.
peterjc
parents: 0
diff changeset
41 sys.exit("Input BAM file not found: %s" % bam_filename)
7254ece0c0ff v0.0.5 - Supports max coverage depth in recent samtools. Expects samtools 1.4.1 via Conda, not via Tool Shed.
peterjc
parents: 0
diff changeset
42 if bai_filename == "-":
7254ece0c0ff v0.0.5 - Supports max coverage depth in recent samtools. Expects samtools 1.4.1 via Conda, not via Tool Shed.
peterjc
parents: 0
diff changeset
43 # Make this optional for ease of use at the command line by hand:
7254ece0c0ff v0.0.5 - Supports max coverage depth in recent samtools. Expects samtools 1.4.1 via Conda, not via Tool Shed.
peterjc
parents: 0
diff changeset
44 if os.path.isfile(bam_filename + ".bai"):
7254ece0c0ff v0.0.5 - Supports max coverage depth in recent samtools. Expects samtools 1.4.1 via Conda, not via Tool Shed.
peterjc
parents: 0
diff changeset
45 bai_filename = bam_filename + ".bai"
0
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
46 if not os.path.isfile(bai_filename):
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
47 if bai_filename == "None":
2
7254ece0c0ff v0.0.5 - Supports max coverage depth in recent samtools. Expects samtools 1.4.1 via Conda, not via Tool Shed.
peterjc
parents: 0
diff changeset
48 sys.exit("Error: Galaxy did not index your BAM file")
7254ece0c0ff v0.0.5 - Supports max coverage depth in recent samtools. Expects samtools 1.4.1 via Conda, not via Tool Shed.
peterjc
parents: 0
diff changeset
49 sys.exit("Input BAI file not found: %s" % bai_filename)
0
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
50
2
7254ece0c0ff v0.0.5 - Supports max coverage depth in recent samtools. Expects samtools 1.4.1 via Conda, not via Tool Shed.
peterjc
parents: 0
diff changeset
51 try:
7254ece0c0ff v0.0.5 - Supports max coverage depth in recent samtools. Expects samtools 1.4.1 via Conda, not via Tool Shed.
peterjc
parents: 0
diff changeset
52 max_depth = int(max_depth)
7254ece0c0ff v0.0.5 - Supports max coverage depth in recent samtools. Expects samtools 1.4.1 via Conda, not via Tool Shed.
peterjc
parents: 0
diff changeset
53 except ValueError:
7254ece0c0ff v0.0.5 - Supports max coverage depth in recent samtools. Expects samtools 1.4.1 via Conda, not via Tool Shed.
peterjc
parents: 0
diff changeset
54 sys.exit("Bad argument for max depth: %r" % max_depth)
7254ece0c0ff v0.0.5 - Supports max coverage depth in recent samtools. Expects samtools 1.4.1 via Conda, not via Tool Shed.
peterjc
parents: 0
diff changeset
55 if max_depth < 0:
7254ece0c0ff v0.0.5 - Supports max coverage depth in recent samtools. Expects samtools 1.4.1 via Conda, not via Tool Shed.
peterjc
parents: 0
diff changeset
56 sys.exit("Bad argument for max depth: %r" % max_depth)
7254ece0c0ff v0.0.5 - Supports max coverage depth in recent samtools. Expects samtools 1.4.1 via Conda, not via Tool Shed.
peterjc
parents: 0
diff changeset
57
7254ece0c0ff v0.0.5 - Supports max coverage depth in recent samtools. Expects samtools 1.4.1 via Conda, not via Tool Shed.
peterjc
parents: 0
diff changeset
58 # fuzz factor to ensure can reach max_depth, e.g. region with
7254ece0c0ff v0.0.5 - Supports max coverage depth in recent samtools. Expects samtools 1.4.1 via Conda, not via Tool Shed.
peterjc
parents: 0
diff changeset
59 # many reads having a deletion present could underestimate the
7254ece0c0ff v0.0.5 - Supports max coverage depth in recent samtools. Expects samtools 1.4.1 via Conda, not via Tool Shed.
peterjc
parents: 0
diff changeset
60 # coverage by capping the number of reads considered
7254ece0c0ff v0.0.5 - Supports max coverage depth in recent samtools. Expects samtools 1.4.1 via Conda, not via Tool Shed.
peterjc
parents: 0
diff changeset
61 depth_margin = 100
7254ece0c0ff v0.0.5 - Supports max coverage depth in recent samtools. Expects samtools 1.4.1 via Conda, not via Tool Shed.
peterjc
parents: 0
diff changeset
62
7254ece0c0ff v0.0.5 - Supports max coverage depth in recent samtools. Expects samtools 1.4.1 via Conda, not via Tool Shed.
peterjc
parents: 0
diff changeset
63 # Assign sensible names with real extensions, and setup symlinks:
0
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
64 tmp_dir = tempfile.mkdtemp()
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
65 bam_file = os.path.join(tmp_dir, "temp.bam")
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
66 bai_file = os.path.join(tmp_dir, "temp.bam.bai")
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
67 idxstats_filename = os.path.join(tmp_dir, "idxstats.tsv")
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
68 depth_filename = os.path.join(tmp_dir, "depth.tsv")
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
69 os.symlink(os.path.abspath(bam_filename), bam_file)
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
70 os.symlink(os.path.abspath(bai_filename), bai_file)
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
71 assert os.path.isfile(bam_file), bam_file
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
72 assert os.path.isfile(bai_file), bai_file
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
73 assert os.path.isfile(bam_file + ".bai"), bam_file
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
74
2
7254ece0c0ff v0.0.5 - Supports max coverage depth in recent samtools. Expects samtools 1.4.1 via Conda, not via Tool Shed.
peterjc
parents: 0
diff changeset
75
0
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
76 def clean_up():
2
7254ece0c0ff v0.0.5 - Supports max coverage depth in recent samtools. Expects samtools 1.4.1 via Conda, not via Tool Shed.
peterjc
parents: 0
diff changeset
77 """Remove our temporary files and directory."""
0
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
78 os.remove(idxstats_filename)
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
79 os.remove(depth_filename)
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
80 os.remove(bam_file)
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
81 os.remove(bai_file)
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
82 os.rmdir(tmp_dir)
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
83
2
7254ece0c0ff v0.0.5 - Supports max coverage depth in recent samtools. Expects samtools 1.4.1 via Conda, not via Tool Shed.
peterjc
parents: 0
diff changeset
84
7254ece0c0ff v0.0.5 - Supports max coverage depth in recent samtools. Expects samtools 1.4.1 via Conda, not via Tool Shed.
peterjc
parents: 0
diff changeset
85 def samtools_depth_opt_available():
7254ece0c0ff v0.0.5 - Supports max coverage depth in recent samtools. Expects samtools 1.4.1 via Conda, not via Tool Shed.
peterjc
parents: 0
diff changeset
86 """Determine if samtools depth supports maximum coverage argument."""
7254ece0c0ff v0.0.5 - Supports max coverage depth in recent samtools. Expects samtools 1.4.1 via Conda, not via Tool Shed.
peterjc
parents: 0
diff changeset
87 child = subprocess.Popen(["samtools", "depth"],
7254ece0c0ff v0.0.5 - Supports max coverage depth in recent samtools. Expects samtools 1.4.1 via Conda, not via Tool Shed.
peterjc
parents: 0
diff changeset
88 stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
7254ece0c0ff v0.0.5 - Supports max coverage depth in recent samtools. Expects samtools 1.4.1 via Conda, not via Tool Shed.
peterjc
parents: 0
diff changeset
89 # Combined stdout/stderr in case samtools is ever inconsistent
7254ece0c0ff v0.0.5 - Supports max coverage depth in recent samtools. Expects samtools 1.4.1 via Conda, not via Tool Shed.
peterjc
parents: 0
diff changeset
90 output, tmp = child.communicate()
7254ece0c0ff v0.0.5 - Supports max coverage depth in recent samtools. Expects samtools 1.4.1 via Conda, not via Tool Shed.
peterjc
parents: 0
diff changeset
91 assert tmp is None
7254ece0c0ff v0.0.5 - Supports max coverage depth in recent samtools. Expects samtools 1.4.1 via Conda, not via Tool Shed.
peterjc
parents: 0
diff changeset
92 # Expect to find this line in the help text, exact wording could change:
7254ece0c0ff v0.0.5 - Supports max coverage depth in recent samtools. Expects samtools 1.4.1 via Conda, not via Tool Shed.
peterjc
parents: 0
diff changeset
93 # -d/-m <int> maximum coverage depth [8000]
7254ece0c0ff v0.0.5 - Supports max coverage depth in recent samtools. Expects samtools 1.4.1 via Conda, not via Tool Shed.
peterjc
parents: 0
diff changeset
94 return " -d/-m " in output
7254ece0c0ff v0.0.5 - Supports max coverage depth in recent samtools. Expects samtools 1.4.1 via Conda, not via Tool Shed.
peterjc
parents: 0
diff changeset
95
7254ece0c0ff v0.0.5 - Supports max coverage depth in recent samtools. Expects samtools 1.4.1 via Conda, not via Tool Shed.
peterjc
parents: 0
diff changeset
96
7254ece0c0ff v0.0.5 - Supports max coverage depth in recent samtools. Expects samtools 1.4.1 via Conda, not via Tool Shed.
peterjc
parents: 0
diff changeset
97 depth_hack = False
7254ece0c0ff v0.0.5 - Supports max coverage depth in recent samtools. Expects samtools 1.4.1 via Conda, not via Tool Shed.
peterjc
parents: 0
diff changeset
98 if not samtools_depth_opt_available():
7254ece0c0ff v0.0.5 - Supports max coverage depth in recent samtools. Expects samtools 1.4.1 via Conda, not via Tool Shed.
peterjc
parents: 0
diff changeset
99 if max_depth + depth_margin <= 8000:
7254ece0c0ff v0.0.5 - Supports max coverage depth in recent samtools. Expects samtools 1.4.1 via Conda, not via Tool Shed.
peterjc
parents: 0
diff changeset
100 sys.stderr.write("WARNING: The version of samtools depth installed does not "
7254ece0c0ff v0.0.5 - Supports max coverage depth in recent samtools. Expects samtools 1.4.1 via Conda, not via Tool Shed.
peterjc
parents: 0
diff changeset
101 "support the -d option, however, the requested max-depth "
7254ece0c0ff v0.0.5 - Supports max coverage depth in recent samtools. Expects samtools 1.4.1 via Conda, not via Tool Shed.
peterjc
parents: 0
diff changeset
102 "is safely under the default of 8000.\n")
7254ece0c0ff v0.0.5 - Supports max coverage depth in recent samtools. Expects samtools 1.4.1 via Conda, not via Tool Shed.
peterjc
parents: 0
diff changeset
103 depth_hack = True
7254ece0c0ff v0.0.5 - Supports max coverage depth in recent samtools. Expects samtools 1.4.1 via Conda, not via Tool Shed.
peterjc
parents: 0
diff changeset
104 else:
7254ece0c0ff v0.0.5 - Supports max coverage depth in recent samtools. Expects samtools 1.4.1 via Conda, not via Tool Shed.
peterjc
parents: 0
diff changeset
105 sys.exit("The version of samtools depth installed does not support the -d option.")
7254ece0c0ff v0.0.5 - Supports max coverage depth in recent samtools. Expects samtools 1.4.1 via Conda, not via Tool Shed.
peterjc
parents: 0
diff changeset
106
0
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
107 # Run samtools idxstats:
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
108 cmd = 'samtools idxstats "%s" > "%s"' % (bam_file, idxstats_filename)
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
109 return_code = os.system(cmd)
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
110 if return_code:
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
111 clean_up()
2
7254ece0c0ff v0.0.5 - Supports max coverage depth in recent samtools. Expects samtools 1.4.1 via Conda, not via Tool Shed.
peterjc
parents: 0
diff changeset
112 sys.exit("Return code %i from command:\n%s" % (return_code, cmd))
0
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
113
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
114 # Run samtools depth:
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
115 # TODO - Parse stdout instead?
2
7254ece0c0ff v0.0.5 - Supports max coverage depth in recent samtools. Expects samtools 1.4.1 via Conda, not via Tool Shed.
peterjc
parents: 0
diff changeset
116 if depth_hack:
7254ece0c0ff v0.0.5 - Supports max coverage depth in recent samtools. Expects samtools 1.4.1 via Conda, not via Tool Shed.
peterjc
parents: 0
diff changeset
117 # Using an old samtools without the -d option, but hard coded default
7254ece0c0ff v0.0.5 - Supports max coverage depth in recent samtools. Expects samtools 1.4.1 via Conda, not via Tool Shed.
peterjc
parents: 0
diff changeset
118 # of 8000 should be fine even allowing a margin for fuzzy output
7254ece0c0ff v0.0.5 - Supports max coverage depth in recent samtools. Expects samtools 1.4.1 via Conda, not via Tool Shed.
peterjc
parents: 0
diff changeset
119 cmd = 'samtools depth "%s" > "%s"' % (bam_file, depth_filename)
7254ece0c0ff v0.0.5 - Supports max coverage depth in recent samtools. Expects samtools 1.4.1 via Conda, not via Tool Shed.
peterjc
parents: 0
diff changeset
120 else:
7254ece0c0ff v0.0.5 - Supports max coverage depth in recent samtools. Expects samtools 1.4.1 via Conda, not via Tool Shed.
peterjc
parents: 0
diff changeset
121 cmd = 'samtools depth -d %i "%s" > "%s"' % (max_depth + depth_margin, bam_file, depth_filename)
0
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
122 return_code = os.system(cmd)
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
123 if return_code:
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
124 clean_up()
2
7254ece0c0ff v0.0.5 - Supports max coverage depth in recent samtools. Expects samtools 1.4.1 via Conda, not via Tool Shed.
peterjc
parents: 0
diff changeset
125 sys.exit("Return code %i from command:\n%s" % (return_code, cmd))
7254ece0c0ff v0.0.5 - Supports max coverage depth in recent samtools. Expects samtools 1.4.1 via Conda, not via Tool Shed.
peterjc
parents: 0
diff changeset
126
0
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
127
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
128 def load_total_coverage(depth_handle, identifier, length):
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
129 """Parse some of the 'samtools depth' output for coverages.
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
130
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
131 Returns min_cov (int), max_cov (int) and mean cov (float).
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
132
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
133 Uses global variables to cache the first line of output from the
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
134 next reference sequence.
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
135 """
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
136 global depth_ref, depth_pos, depth_reads
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
137
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
138 # print("====")
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
139 # print("%s coverage calculation, length %i, ..." % (identifier, length))
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
140
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
141 if depth_ref is None:
2
7254ece0c0ff v0.0.5 - Supports max coverage depth in recent samtools. Expects samtools 1.4.1 via Conda, not via Tool Shed.
peterjc
parents: 0
diff changeset
142 # Right at start of file / new contig
0
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
143 line = depth_handle.readline()
2
7254ece0c0ff v0.0.5 - Supports max coverage depth in recent samtools. Expects samtools 1.4.1 via Conda, not via Tool Shed.
peterjc
parents: 0
diff changeset
144 # Are we at the end of the file?
7254ece0c0ff v0.0.5 - Supports max coverage depth in recent samtools. Expects samtools 1.4.1 via Conda, not via Tool Shed.
peterjc
parents: 0
diff changeset
145 if not line:
7254ece0c0ff v0.0.5 - Supports max coverage depth in recent samtools. Expects samtools 1.4.1 via Conda, not via Tool Shed.
peterjc
parents: 0
diff changeset
146 # Must be at the end of the file.
7254ece0c0ff v0.0.5 - Supports max coverage depth in recent samtools. Expects samtools 1.4.1 via Conda, not via Tool Shed.
peterjc
parents: 0
diff changeset
147 # This can happen if the file contig(s) had no reads mapped
7254ece0c0ff v0.0.5 - Supports max coverage depth in recent samtools. Expects samtools 1.4.1 via Conda, not via Tool Shed.
peterjc
parents: 0
diff changeset
148 return 0, 0, 0.0, 0
0
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
149 depth_ref, depth_pos, depth_reads = line.rstrip("\n").split()
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
150 depth_pos = int(depth_pos)
2
7254ece0c0ff v0.0.5 - Supports max coverage depth in recent samtools. Expects samtools 1.4.1 via Conda, not via Tool Shed.
peterjc
parents: 0
diff changeset
151 depth_reads = min(max_depth, int(depth_reads))
0
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
152 # Can now treat as later references where first line cached
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
153 elif identifier != depth_ref:
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
154 # Infer that identifier had coverage zero,
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
155 # and so was not in the 'samtools depth'
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
156 # output.
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
157 # print("%s appears to have no coverage at all" % identifier)
2
7254ece0c0ff v0.0.5 - Supports max coverage depth in recent samtools. Expects samtools 1.4.1 via Conda, not via Tool Shed.
peterjc
parents: 0
diff changeset
158 return 0, 0, 0.0, 0
0
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
159
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
160 # Good, at start of expected reference
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
161 bases = depth_reads
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
162 if depth_pos == 1:
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
163 min_cov = depth_reads
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
164 else:
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
165 # print("%s has no coverage at start" % identifier)
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
166 min_cov = 0
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
167 max_cov = depth_reads
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
168
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
169 last_pos = depth_pos
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
170 depth_ref = None
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
171 depth_pos = 0
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
172 depth_reads = 0
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
173 for line in depth_handle:
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
174 ref, pos, depth = line.rstrip("\n").split()
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
175 pos = int(pos)
2
7254ece0c0ff v0.0.5 - Supports max coverage depth in recent samtools. Expects samtools 1.4.1 via Conda, not via Tool Shed.
peterjc
parents: 0
diff changeset
176 depth = min(max_depth, int(depth))
0
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
177 if ref != identifier:
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
178 # Reached the end of this identifier's coverage
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
179 # so cache this ready for next identifier
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
180 depth_ref, depth_pos, depth_reads = ref, pos, depth
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
181 break
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
182 bases += depth
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
183 if last_pos + 1 < pos:
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
184 # print("%s has no coverage between %i and %i" % (identifier, last_pos, pos))
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
185 min_cov = 0
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
186 else:
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
187 min_cov = min(min_cov, depth)
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
188 max_cov = max(max_cov, depth)
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
189 last_pos = pos
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
190
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
191 # Reach the end of this identifier's coverage or end of file
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
192 if last_pos < length:
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
193 # print("%s has no coverage at end" % identifier)
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
194 min_cov = 0
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
195 mean_cov = bases / float(length)
2
7254ece0c0ff v0.0.5 - Supports max coverage depth in recent samtools. Expects samtools 1.4.1 via Conda, not via Tool Shed.
peterjc
parents: 0
diff changeset
196 return min_cov, max_cov, mean_cov, bases
7254ece0c0ff v0.0.5 - Supports max coverage depth in recent samtools. Expects samtools 1.4.1 via Conda, not via Tool Shed.
peterjc
parents: 0
diff changeset
197
0
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
198
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
199 # Parse and combine the output
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
200 out_handle = open(tabular_filename, "w")
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
201 out_handle.write("#identifer\tlength\tmapped\tplaced\tmin_cov\tmax_cov\tmean_cov\n")
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
202
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
203 idxstats_handle = open(idxstats_filename)
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
204 depth_handle = open(depth_filename)
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
205
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
206 depth_ref = None
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
207 depth_pos = 0
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
208 depth_reads = 0
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
209
2
7254ece0c0ff v0.0.5 - Supports max coverage depth in recent samtools. Expects samtools 1.4.1 via Conda, not via Tool Shed.
peterjc
parents: 0
diff changeset
210 global_bases = 0
7254ece0c0ff v0.0.5 - Supports max coverage depth in recent samtools. Expects samtools 1.4.1 via Conda, not via Tool Shed.
peterjc
parents: 0
diff changeset
211 global_length = 0
0
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
212 for line in idxstats_handle:
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
213 identifier, length, mapped, placed = line.rstrip("\n").split()
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
214 length = int(length)
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
215 mapped = int(mapped)
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
216 placed = int(placed)
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
217 if identifier == "*":
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
218 min_cov = 0
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
219 max_cov = 0
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
220 mean_cov = 0.0
2
7254ece0c0ff v0.0.5 - Supports max coverage depth in recent samtools. Expects samtools 1.4.1 via Conda, not via Tool Shed.
peterjc
parents: 0
diff changeset
221 bases = 0
0
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
222 else:
2
7254ece0c0ff v0.0.5 - Supports max coverage depth in recent samtools. Expects samtools 1.4.1 via Conda, not via Tool Shed.
peterjc
parents: 0
diff changeset
223 min_cov, max_cov, mean_cov, bases = load_total_coverage(depth_handle, identifier, length)
7254ece0c0ff v0.0.5 - Supports max coverage depth in recent samtools. Expects samtools 1.4.1 via Conda, not via Tool Shed.
peterjc
parents: 0
diff changeset
224 if max_cov > max_depth:
7254ece0c0ff v0.0.5 - Supports max coverage depth in recent samtools. Expects samtools 1.4.1 via Conda, not via Tool Shed.
peterjc
parents: 0
diff changeset
225 sys.exit("Using max depth %i yet saw max coverage %i for %s"
7254ece0c0ff v0.0.5 - Supports max coverage depth in recent samtools. Expects samtools 1.4.1 via Conda, not via Tool Shed.
peterjc
parents: 0
diff changeset
226 % (max_depth, max_cov, identifier))
0
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
227 out_handle.write("%s\t%i\t%i\t%i\t%i\t%i\t%0.2f\n"
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
228 % (identifier, length, mapped, placed,
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
229 min_cov, max_cov, mean_cov))
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
230 if not (min_cov <= mean_cov <= max_cov):
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
231 out_handle.write("ERROR, expect min_cov <= mean_cov <= max_cov\n")
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
232 idxstats_handle.close()
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
233 depth_handle.close()
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
234 out_handle.close()
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
235 clean_up()
2
7254ece0c0ff v0.0.5 - Supports max coverage depth in recent samtools. Expects samtools 1.4.1 via Conda, not via Tool Shed.
peterjc
parents: 0
diff changeset
236 sys.exit("Problem with coverage for %s, expect min_cov <= mean_cov <= max_cov"
0
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
237 % identifier)
2
7254ece0c0ff v0.0.5 - Supports max coverage depth in recent samtools. Expects samtools 1.4.1 via Conda, not via Tool Shed.
peterjc
parents: 0
diff changeset
238 global_length += length
7254ece0c0ff v0.0.5 - Supports max coverage depth in recent samtools. Expects samtools 1.4.1 via Conda, not via Tool Shed.
peterjc
parents: 0
diff changeset
239 global_bases += bases
0
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
240
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
241 idxstats_handle.close()
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
242 depth_handle.close()
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
243 out_handle.close()
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
244
2
7254ece0c0ff v0.0.5 - Supports max coverage depth in recent samtools. Expects samtools 1.4.1 via Conda, not via Tool Shed.
peterjc
parents: 0
diff changeset
245 print("Total reference length %i with overall mean coverage %0.2f" % (global_length, float(global_bases) / global_length))
7254ece0c0ff v0.0.5 - Supports max coverage depth in recent samtools. Expects samtools 1.4.1 via Conda, not via Tool Shed.
peterjc
parents: 0
diff changeset
246
0
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
247 # Remove the temp symlinks and files:
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
248 clean_up()
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
249
ca8f63f2f7d4 Uploaded v0.0.1
peterjc
parents:
diff changeset
250 if depth_ref is not None:
2
7254ece0c0ff v0.0.5 - Supports max coverage depth in recent samtools. Expects samtools 1.4.1 via Conda, not via Tool Shed.
peterjc
parents: 0
diff changeset
251 sys.exit("Left over output from 'samtools depth'? %r" % depth_ref)