annotate filter-below-abund.py @ 2:930d8e6708a5 draft

planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit bfa8bda732de882f6fa5f5375f8468ad229cceea
author iuc
date Wed, 09 Nov 2016 05:58:36 -0500
parents d5a18dd63529
children 18dc7b2d49d9
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
d5a18dd63529 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff changeset
1 #! /usr/bin/env python
d5a18dd63529 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff changeset
2 # This file is part of khmer, https://github.com/dib-lab/khmer/, and is
d5a18dd63529 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff changeset
3 # Copyright (C) 2011-2015, Michigan State University.
d5a18dd63529 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff changeset
4 # Copyright (C) 2015, The Regents of the University of California.
d5a18dd63529 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff changeset
5 #
d5a18dd63529 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff changeset
6 # Redistribution and use in source and binary forms, with or without
d5a18dd63529 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff changeset
7 # modification, are permitted provided that the following conditions are
d5a18dd63529 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff changeset
8 # met:
d5a18dd63529 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff changeset
9 #
d5a18dd63529 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff changeset
10 # * Redistributions of source code must retain the above copyright
d5a18dd63529 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff changeset
11 # notice, this list of conditions and the following disclaimer.
d5a18dd63529 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff changeset
12 #
d5a18dd63529 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff changeset
13 # * Redistributions in binary form must reproduce the above
d5a18dd63529 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff changeset
14 # copyright notice, this list of conditions and the following
d5a18dd63529 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff changeset
15 # disclaimer in the documentation and/or other materials provided
d5a18dd63529 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff changeset
16 # with the distribution.
d5a18dd63529 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff changeset
17 #
d5a18dd63529 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff changeset
18 # * Neither the name of the Michigan State University nor the names
d5a18dd63529 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff changeset
19 # of its contributors may be used to endorse or promote products
d5a18dd63529 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff changeset
20 # derived from this software without specific prior written
d5a18dd63529 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff changeset
21 # permission.
d5a18dd63529 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff changeset
22 #
d5a18dd63529 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff changeset
23 # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
d5a18dd63529 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff changeset
24 # "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
d5a18dd63529 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff changeset
25 # LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
d5a18dd63529 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff changeset
26 # A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
d5a18dd63529 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff changeset
27 # HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
d5a18dd63529 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff changeset
28 # SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
d5a18dd63529 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff changeset
29 # LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
d5a18dd63529 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff changeset
30 # DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
d5a18dd63529 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff changeset
31 # THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
d5a18dd63529 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff changeset
32 # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
d5a18dd63529 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff changeset
33 # OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
d5a18dd63529 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff changeset
34 #
d5a18dd63529 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff changeset
35 # Contact: khmer-project@idyll.org
d5a18dd63529 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff changeset
36 from __future__ import print_function
d5a18dd63529 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff changeset
37 import sys
d5a18dd63529 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff changeset
38 import os
d5a18dd63529 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff changeset
39 import khmer
d5a18dd63529 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff changeset
40 from khmer.thread_utils import ThreadedSequenceProcessor, verbose_fasta_iter
d5a18dd63529 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff changeset
41
d5a18dd63529 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff changeset
42 WORKER_THREADS = 8
d5a18dd63529 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff changeset
43 GROUPSIZE = 100
d5a18dd63529 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff changeset
44
d5a18dd63529 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff changeset
45 CUTOFF = 50
d5a18dd63529 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff changeset
46
d5a18dd63529 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff changeset
47 ###
d5a18dd63529 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff changeset
48
d5a18dd63529 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff changeset
49
d5a18dd63529 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff changeset
50 def main():
d5a18dd63529 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff changeset
51 counting_ht = sys.argv[1]
d5a18dd63529 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff changeset
52 infiles = sys.argv[2:]
d5a18dd63529 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff changeset
53
d5a18dd63529 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff changeset
54 print('file with ht: %s' % counting_ht)
d5a18dd63529 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff changeset
55 print('-- settings:')
d5a18dd63529 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff changeset
56 print('N THREADS', WORKER_THREADS)
d5a18dd63529 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff changeset
57 print('--')
d5a18dd63529 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff changeset
58
d5a18dd63529 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff changeset
59 print('making hashtable')
d5a18dd63529 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff changeset
60 ht = khmer.load_countgraph(counting_ht)
d5a18dd63529 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff changeset
61 K = ht.ksize()
d5a18dd63529 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff changeset
62
d5a18dd63529 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff changeset
63 for infile in infiles:
d5a18dd63529 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff changeset
64 print('filtering', infile)
d5a18dd63529 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff changeset
65 outfile = os.path.basename(infile) + '.below'
d5a18dd63529 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff changeset
66
d5a18dd63529 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff changeset
67 outfp = open(outfile, 'w')
d5a18dd63529 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff changeset
68
d5a18dd63529 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff changeset
69 def process_fn(record, ht=ht):
d5a18dd63529 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff changeset
70 name = record['name']
d5a18dd63529 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff changeset
71 seq = record['sequence']
d5a18dd63529 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff changeset
72 if 'N' in seq:
d5a18dd63529 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff changeset
73 return None, None
d5a18dd63529 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff changeset
74
d5a18dd63529 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff changeset
75 trim_seq, trim_at = ht.trim_below_abundance(seq, CUTOFF)
d5a18dd63529 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff changeset
76
d5a18dd63529 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff changeset
77 if trim_at >= K:
d5a18dd63529 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff changeset
78 return name, trim_seq
d5a18dd63529 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff changeset
79
d5a18dd63529 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff changeset
80 return None, None
d5a18dd63529 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff changeset
81
d5a18dd63529 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff changeset
82 tsp = ThreadedSequenceProcessor(process_fn, WORKER_THREADS, GROUPSIZE)
d5a18dd63529 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff changeset
83
d5a18dd63529 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff changeset
84 tsp.start(verbose_fasta_iter(infile), outfp)
d5a18dd63529 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff changeset
85
d5a18dd63529 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff changeset
86 if __name__ == '__main__':
d5a18dd63529 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit be9a20423d1a6ec33d59341e0e61b535127bbce2
iuc
parents:
diff changeset
87 main()