annotate tools/seq_filter_by_mapping/seq_filter_by_mapping.py @ 0:1d773da0ccf0 draft

Uploaded v0.0.2, fixed some error messages
author peterjc
date Tue, 27 Jan 2015 08:31:13 -0500
parents
children 8ff0ac66f1a3
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
1 #!/usr/bin/env python
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
2 """Filter a FASTA, FASTQ or SSF file with SAM/BAM mapping information.
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
3
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
4 When filtering an SFF file, any Roche XML manifest in the input file is
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
5 preserved in both output files.
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
6
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
7 This tool is a short Python script which requires Biopython 1.54 or later
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
8 for SFF file support. If you use this tool in scientific work leading to a
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
9 publication, please cite the Biopython application note:
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
10
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
11 Cock et al 2009. Biopython: freely available Python tools for computational
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
12 molecular biology and bioinformatics. Bioinformatics 25(11) 1422-3.
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
13 http://dx.doi.org/10.1093/bioinformatics/btp163 pmid:19304878.
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
14
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
15 This script is copyright 2014 by Peter Cock, The James Hutton Institute
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
16 (formerly the Scottish Crop Research Institute, SCRI), UK. All rights reserved.
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
17 See accompanying text file for licence details (MIT license).
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
18
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
19 Use -v or --version to get the version, -h or --help for help.
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
20 """
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
21 import os
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
22 import sys
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
23 import re
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
24 import subprocess
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
25 from optparse import OptionParser
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
26
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
27 def sys_exit(msg, err=1):
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
28 sys.stderr.write(msg.rstrip() + "\n")
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
29 sys.exit(err)
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
30
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
31 #Parse Command Line
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
32 usage = """Use as follows:
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
33
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
34 $ python seq_filter_by_mapping.py [options] mapping.sam/bam [more mappings]
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
35
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
36 e.g. Positive matches using column one from a single BAM file:
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
37
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
38 $ seq_filter_by_mapping.py -i my_seqs.fastq -f fastq -p matches.fastq mapping.bam
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
39
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
40 Multiple SAM/BAM mapping files may be given.
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
41 """
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
42 parser = OptionParser(usage=usage)
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
43 parser.add_option('-i', '--input', dest='input',
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
44 default=None, help='Input sequences filename',
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
45 metavar="FILE")
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
46 parser.add_option('-f', '--format', dest='format',
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
47 default=None,
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
48 help='Input sequence format (e.g. fasta, fastq, sff)')
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
49 parser.add_option('-p', '--positive', dest='output_positive',
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
50 default=None,
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
51 help='Output filename for mapping reads',
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
52 metavar="FILE")
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
53 parser.add_option('-n', '--negative', dest='output_negative',
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
54 default=None,
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
55 help='Output filename for non-mapping reads',
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
56 metavar="FILE")
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
57 parser.add_option("-m", "--pair-mode", dest="pair_mode",
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
58 default="lax",
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
59 help="How to treat paired reads (lax or strict, default lax)")
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
60 parser.add_option("-v", "--version", dest="version",
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
61 default=False, action="store_true",
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
62 help="Show version and quit")
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
63
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
64 options, args = parser.parse_args()
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
65
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
66 if options.version:
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
67 print "v0.0.2"
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
68 sys.exit(0)
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
69
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
70 in_file = options.input
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
71 seq_format = options.format
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
72 out_positive_file = options.output_positive
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
73 out_negative_file = options.output_negative
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
74 pair_mode = options.pair_mode
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
75
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
76 if in_file is None or not os.path.isfile(in_file):
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
77 sys_exit("Missing input file: %r" % in_file)
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
78 if out_positive_file is None and out_negative_file is None:
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
79 sys_exit("Neither output file requested")
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
80 if seq_format is None:
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
81 sys_exit("Missing sequence format")
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
82 if pair_mode not in ["lax", "strict"]:
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
83 sys_exit("Pair mode argument should be 'lax' or 'strict', not %r" % pair_mode)
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
84 for mapping in args:
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
85 if not os.path.isfile(mapping):
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
86 sys_exit("Mapping file %r not found" % mapping)
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
87 if not args:
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
88 sys_exit("At least one SAM/BAM mapping file is required")
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
89
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
90
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
91 #Cope with three widely used suffix naming convensions,
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
92 #Illumina: /1 or /2
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
93 #Forward/revered: .f or .r
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
94 #Sanger, e.g. .p1k and .q1k
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
95 #See http://staden.sourceforge.net/manual/pregap4_unix_50.html
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
96 #re_f = re.compile(r"(/1|\.f|\.[sfp]\d\w*)$")
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
97 #re_r = re.compile(r"(/2|\.r|\.[rq]\d\w*)$")
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
98 re_suffix = re.compile(r"(/1|\.f|\.[sfp]\d\w*|/2|\.r|\.[rq]\d\w*)$")
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
99 assert re_suffix.search("demo.f")
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
100 assert re_suffix.search("demo.s1")
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
101 assert re_suffix.search("demo.f1k")
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
102 assert re_suffix.search("demo.p1")
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
103 assert re_suffix.search("demo.p1k")
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
104 assert re_suffix.search("demo.p1lk")
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
105 assert re_suffix.search("demo/2")
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
106 assert re_suffix.search("demo.r")
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
107 assert re_suffix.search("demo.q1")
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
108 assert re_suffix.search("demo.q1lk")
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
109
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
110 def clean_name(name):
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
111 """Remove suffix."""
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
112 match = re_suffix.search(name)
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
113 if match:
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
114 # Use the fact this is a suffix, and regular expression will be
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
115 # anchored to the end of the name:
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
116 return name[:match.start()]
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
117 else:
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
118 # Nothing to do
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
119 return name
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
120 assert clean_name("foo/1") == "foo"
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
121 assert clean_name("foo/2") == "foo"
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
122 assert clean_name("bar.f") == "bar"
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
123 assert clean_name("bar.r") == "bar"
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
124 assert clean_name("baz.p1") == "baz"
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
125 assert clean_name("baz.q2") == "baz"
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
126
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
127 mapped_chars = { '>' :'__gt__',
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
128 '<' :'__lt__',
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
129 "'" :'__sq__',
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
130 '"' :'__dq__',
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
131 '[' :'__ob__',
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
132 ']' :'__cb__',
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
133 '{' :'__oc__',
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
134 '}' :'__cc__',
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
135 '@' : '__at__',
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
136 '\n' : '__cn__',
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
137 '\r' : '__cr__',
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
138 '\t' : '__tc__',
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
139 '#' : '__pd__'
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
140 }
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
141
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
142 def load_mapping_ids(filename, pair_mode, ids):
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
143 """Parse SAM/BAM file, updating given set of ids.
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
144
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
145 Parses BAM files via call out to samtools view command.
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
146 """
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
147 handle = open(filename, "rb")
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
148 magic = handle.read(4)
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
149 if magic == b"\x1f\x8b\x08\x04":
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
150 # Presumably a BAM file then...
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
151 handle.close()
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
152 # Call samtools view, don't need header so no -h added:
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
153 child = subprocess.Popen(["samtools", "view", filename],
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
154 stdin=None,
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
155 stdout=subprocess.PIPE,
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
156 stderr=subprocess.PIPE)
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
157 handle = child.stdout
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
158 else:
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
159 # Presumably a SAM file...
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
160 child = None
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
161 handle.seek(0)
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
162 # Handle should now contain SAM records
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
163 for line in handle:
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
164 # Ignore header lines
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
165 if line[0] != "@":
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
166 qname, flag, rest = line.split("\t", 2)
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
167 flag = int(flag)
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
168 if pair_mode == "lax":
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
169 # If either read or its partner is mapped, take it!
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
170 # Being lazy, since we will look at both reads
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
171 # can just check if (either) has 0x4 clear.
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
172 if not (flag & 0x4):
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
173 ids.add(qname)
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
174 elif pair_mode == "strict":
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
175 # For paired reads, require BOTH be mapped.
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
176 if (flag & 0x4):
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
177 # This is unmapped, ignore it
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
178 pass
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
179 elif not (flag & 0x1):
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
180 # Unpaired (& mapped) - take it
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
181 ids.add(qname)
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
182 elif not (flag & 0x8):
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
183 # Paired and partner also mapped, good
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
184 ids.add(qname)
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
185 if child:
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
186 # Check terminated normally.
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
187 stdout, stderr = child.communicate()
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
188 assert child.returncode is not None
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
189 if child.returncode:
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
190 msg = "Error %i from 'samtools view %s'\n%s" % (child.returncode,
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
191 filename, stderr)
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
192 sys_exit(msg.strip(), child.returncode)
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
193 else:
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
194 handle.close()
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
195
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
196
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
197 # Read mapping file(s) and record all mapped identifiers
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
198 ids = set()
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
199 for filename in args:
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
200 load_mapping_ids(filename, pair_mode, ids)
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
201 # TODO - If want to support naive paired mode, have to record
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
202 # more than just qname (need /1 or /2 indicator)
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
203 print("Loaded %i mapped IDs" % (len(ids)))
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
204 if len(ids) < 10:
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
205 print("Looking for %s" % ", ".join(sorted(ids)))
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
206
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
207 def crude_fasta_iterator(handle):
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
208 """Yields tuples, record ID and the full record as a string."""
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
209 while True:
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
210 line = handle.readline()
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
211 if line == "":
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
212 return # Premature end of file, or just empty?
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
213 if line[0] == ">":
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
214 break
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
215
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
216 no_id_warned = False
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
217 while True:
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
218 if line[0] != ">":
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
219 raise ValueError(
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
220 "Records in Fasta files should start with '>' character")
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
221 try:
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
222 id = line[1:].split(None, 1)[0]
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
223 except IndexError:
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
224 if not no_id_warned:
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
225 sys.stderr.write("WARNING - Malformed FASTA entry with no identifier\n")
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
226 no_id_warned = True
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
227 id = None
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
228 lines = [line]
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
229 line = handle.readline()
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
230 while True:
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
231 if not line:
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
232 break
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
233 if line[0] == ">":
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
234 break
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
235 lines.append(line)
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
236 line = handle.readline()
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
237 yield id, "".join(lines)
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
238 if not line:
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
239 return # StopIteration
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
240
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
241
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
242 def fasta_filter(in_file, pos_file, neg_file, wanted):
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
243 """FASTA filter producing 60 character line wrapped outout."""
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
244 pos_count = neg_count = 0
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
245 #Galaxy now requires Python 2.5+ so can use with statements,
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
246 with open(in_file) as in_handle:
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
247 #Doing the if statement outside the loop for speed
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
248 #(with the downside of three very similar loops).
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
249 if pos_file is not None and neg_file is not None:
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
250 print "Generating two FASTA files"
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
251 with open(pos_file, "w") as pos_handle:
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
252 with open(neg_file, "w") as neg_handle:
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
253 for identifier, record in crude_fasta_iterator(in_handle):
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
254 if clean_name(identifier) in wanted:
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
255 pos_handle.write(record)
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
256 pos_count += 1
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
257 else:
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
258 neg_handle.write(record)
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
259 neg_count += 1
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
260 elif pos_file is not None:
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
261 print "Generating matching FASTA file"
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
262 with open(pos_file, "w") as pos_handle:
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
263 for identifier, record in crude_fasta_iterator(in_handle):
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
264 if clean_name(identifier) in wanted:
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
265 pos_handle.write(record)
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
266 pos_count += 1
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
267 else:
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
268 neg_count += 1
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
269 else:
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
270 print "Generating non-matching FASTA file"
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
271 assert neg_file is not None
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
272 with open(neg_file, "w") as neg_handle:
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
273 for identifier, record in crude_fasta_iterator(in_handle):
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
274 if clean_name(identifier) in wanted:
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
275 pos_count += 1
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
276 else:
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
277 neg_handle.write(record)
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
278 neg_count += 1
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
279 return pos_count, neg_count
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
280
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
281
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
282 def fastq_filter(in_file, pos_file, neg_file, wanted):
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
283 """FASTQ filter."""
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
284 from Bio.SeqIO.QualityIO import FastqGeneralIterator
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
285 handle = open(in_file, "r")
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
286 if out_positive_file is not None and out_negative_file is not None:
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
287 print "Generating two FASTQ files"
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
288 positive_handle = open(out_positive_file, "w")
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
289 negative_handle = open(out_negative_file, "w")
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
290 print in_file
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
291 for title, seq, qual in FastqGeneralIterator(handle):
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
292 # print("%s --> %s" % (title, clean_name(title.split(None, 1)[0])))
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
293 if clean_name(title.split(None, 1)[0]) in ids:
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
294 positive_handle.write("@%s\n%s\n+\n%s\n" % (title, seq, qual))
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
295 else:
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
296 negative_handle.write("@%s\n%s\n+\n%s\n" % (title, seq, qual))
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
297 positive_handle.close()
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
298 negative_handle.close()
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
299 elif out_positive_file is not None:
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
300 print "Generating matching FASTQ file"
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
301 positive_handle = open(out_positive_file, "w")
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
302 for title, seq, qual in FastqGeneralIterator(handle):
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
303 if clean_name(title.split(None, 1)[0]) in ids:
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
304 positive_handle.write("@%s\n%s\n+\n%s\n" % (title, seq, qual))
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
305 positive_handle.close()
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
306 elif out_negative_file is not None:
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
307 print "Generating non-matching FASTQ file"
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
308 negative_handle = open(out_negative_file, "w")
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
309 for title, seq, qual in FastqGeneralIterator(handle):
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
310 if clean_name(title.split(None, 1)[0]) not in ids:
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
311 negative_handle.write("@%s\n%s\n+\n%s\n" % (title, seq, qual))
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
312 negative_handle.close()
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
313 handle.close()
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
314 # This does not currently bother to record record counts (faster)
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
315
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
316 def sff_filter(in_file, pos_file, neg_file, wanted):
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
317 """SFF filter."""
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
318 try:
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
319 from Bio.SeqIO.SffIO import SffIterator, SffWriter
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
320 except ImportError:
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
321 sys_exit("SFF filtering requires Biopython 1.54 or later")
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
322
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
323 try:
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
324 from Bio.SeqIO.SffIO import ReadRocheXmlManifest
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
325 except ImportError:
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
326 #Prior to Biopython 1.56 this was a private function
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
327 from Bio.SeqIO.SffIO import _sff_read_roche_index_xml as ReadRocheXmlManifest
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
328
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
329 in_handle = open(in_file, "rb") #must be binary mode!
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
330 try:
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
331 manifest = ReadRocheXmlManifest(in_handle)
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
332 except ValueError:
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
333 manifest = None
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
334
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
335 #This makes two passes though the SFF file with isn't so efficient,
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
336 #but this makes the code simple.
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
337 pos_count = neg_count = 0
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
338 if out_positive_file is not None:
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
339 out_handle = open(out_positive_file, "wb")
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
340 writer = SffWriter(out_handle, xml=manifest)
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
341 in_handle.seek(0) #start again after getting manifest
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
342 pos_count = writer.write_file(rec for rec in SffIterator(in_handle) if clean_name(rec.id) in ids)
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
343 out_handle.close()
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
344 if out_negative_file is not None:
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
345 out_handle = open(out_negative_file, "wb")
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
346 writer = SffWriter(out_handle, xml=manifest)
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
347 in_handle.seek(0) #start again
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
348 neg_count = writer.write_file(rec for rec in SffIterator(in_handle) if clean_name(rec.id) not in ids)
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
349 out_handle.close()
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
350 #And we're done
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
351 in_handle.close()
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
352 return pos_count, neg_count
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
353
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
354
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
355 if seq_format.lower()=="sff":
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
356 # Now write filtered SFF file based on IDs wanted
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
357 pos_count, neg_count = sff_filter(in_file, out_positive_file, out_negative_file, ids)
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
358 # At the time of writing, Galaxy doesn't show SFF file read counts,
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
359 # so it is useful to put them in stdout and thus shown in job info.
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
360 print "%i with and %i without specified IDs" % (pos_count, neg_count)
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
361 elif seq_format.lower()=="fasta":
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
362 # Write filtered FASTA file based on IDs from tabular file
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
363 pos_count, neg_count = fasta_filter(in_file, out_positive_file, out_negative_file, ids)
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
364 print "%i with and %i without specified IDs" % (pos_count, neg_count)
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
365 elif seq_format.lower().startswith("fastq"):
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
366 #Write filtered FASTQ file based on IDs from mapping file
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
367 fastq_filter(in_file, out_positive_file, out_negative_file, ids)
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
368 # This does not currently track the counts
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
369 else:
1d773da0ccf0 Uploaded v0.0.2, fixed some error messages
peterjc
parents:
diff changeset
370 sys_exit("Unsupported file type %r" % seq_format)