Mercurial > repos > lparsons > cutadapt
annotate cutadapt_galaxy_wrapper.py @ 0:8b064ea16722
Initial version with multiple adapter support
author | Lance Parsons <lparsons@princeton.edu> |
---|---|
date | Fri, 13 May 2011 15:54:01 -0400 |
parents | |
children | 7ed26fc9fa8a |
rev | line source |
---|---|
0
8b064ea16722
Initial version with multiple adapter support
Lance Parsons <lparsons@princeton.edu>
parents:
diff
changeset
|
1 #!/usr/bin/env python |
8b064ea16722
Initial version with multiple adapter support
Lance Parsons <lparsons@princeton.edu>
parents:
diff
changeset
|
2 """ |
8b064ea16722
Initial version with multiple adapter support
Lance Parsons <lparsons@princeton.edu>
parents:
diff
changeset
|
3 SYNOPSIS |
8b064ea16722
Initial version with multiple adapter support
Lance Parsons <lparsons@princeton.edu>
parents:
diff
changeset
|
4 |
8b064ea16722
Initial version with multiple adapter support
Lance Parsons <lparsons@princeton.edu>
parents:
diff
changeset
|
5 cutadapt_galaxy_wrapper.py |
8b064ea16722
Initial version with multiple adapter support
Lance Parsons <lparsons@princeton.edu>
parents:
diff
changeset
|
6 -i input_file |
8b064ea16722
Initial version with multiple adapter support
Lance Parsons <lparsons@princeton.edu>
parents:
diff
changeset
|
7 -o output_file |
8b064ea16722
Initial version with multiple adapter support
Lance Parsons <lparsons@princeton.edu>
parents:
diff
changeset
|
8 [-f format (fastq/fastq/etc.)] |
8b064ea16722
Initial version with multiple adapter support
Lance Parsons <lparsons@princeton.edu>
parents:
diff
changeset
|
9 [-a 3' adapter sequence] |
8b064ea16722
Initial version with multiple adapter support
Lance Parsons <lparsons@princeton.edu>
parents:
diff
changeset
|
10 [-b 3' or 5' anywhere adapter sequence] |
8b064ea16722
Initial version with multiple adapter support
Lance Parsons <lparsons@princeton.edu>
parents:
diff
changeset
|
11 [-e error_rate] |
8b064ea16722
Initial version with multiple adapter support
Lance Parsons <lparsons@princeton.edu>
parents:
diff
changeset
|
12 [-n count] |
8b064ea16722
Initial version with multiple adapter support
Lance Parsons <lparsons@princeton.edu>
parents:
diff
changeset
|
13 [-O overlap_length] |
8b064ea16722
Initial version with multiple adapter support
Lance Parsons <lparsons@princeton.edu>
parents:
diff
changeset
|
14 [--discard discard trimmed reads] |
8b064ea16722
Initial version with multiple adapter support
Lance Parsons <lparsons@princeton.edu>
parents:
diff
changeset
|
15 [-m minimum read length] |
8b064ea16722
Initial version with multiple adapter support
Lance Parsons <lparsons@princeton.edu>
parents:
diff
changeset
|
16 [-M maximum read length] |
8b064ea16722
Initial version with multiple adapter support
Lance Parsons <lparsons@princeton.edu>
parents:
diff
changeset
|
17 [-h,--help] [-v,--verbose] [--version] |
8b064ea16722
Initial version with multiple adapter support
Lance Parsons <lparsons@princeton.edu>
parents:
diff
changeset
|
18 |
8b064ea16722
Initial version with multiple adapter support
Lance Parsons <lparsons@princeton.edu>
parents:
diff
changeset
|
19 DESCRIPTION |
8b064ea16722
Initial version with multiple adapter support
Lance Parsons <lparsons@princeton.edu>
parents:
diff
changeset
|
20 |
8b064ea16722
Initial version with multiple adapter support
Lance Parsons <lparsons@princeton.edu>
parents:
diff
changeset
|
21 Wrapper for cutadapt running as a galaxy tool |
8b064ea16722
Initial version with multiple adapter support
Lance Parsons <lparsons@princeton.edu>
parents:
diff
changeset
|
22 |
8b064ea16722
Initial version with multiple adapter support
Lance Parsons <lparsons@princeton.edu>
parents:
diff
changeset
|
23 AUTHOR |
8b064ea16722
Initial version with multiple adapter support
Lance Parsons <lparsons@princeton.edu>
parents:
diff
changeset
|
24 |
8b064ea16722
Initial version with multiple adapter support
Lance Parsons <lparsons@princeton.edu>
parents:
diff
changeset
|
25 Lance Parsons <lparsons@princeton.edu> |
8b064ea16722
Initial version with multiple adapter support
Lance Parsons <lparsons@princeton.edu>
parents:
diff
changeset
|
26 |
8b064ea16722
Initial version with multiple adapter support
Lance Parsons <lparsons@princeton.edu>
parents:
diff
changeset
|
27 LICENSE |
8b064ea16722
Initial version with multiple adapter support
Lance Parsons <lparsons@princeton.edu>
parents:
diff
changeset
|
28 |
8b064ea16722
Initial version with multiple adapter support
Lance Parsons <lparsons@princeton.edu>
parents:
diff
changeset
|
29 This script is in the public domain, free from copyrights or restrictions. |
8b064ea16722
Initial version with multiple adapter support
Lance Parsons <lparsons@princeton.edu>
parents:
diff
changeset
|
30 |
8b064ea16722
Initial version with multiple adapter support
Lance Parsons <lparsons@princeton.edu>
parents:
diff
changeset
|
31 VERSION |
8b064ea16722
Initial version with multiple adapter support
Lance Parsons <lparsons@princeton.edu>
parents:
diff
changeset
|
32 |
8b064ea16722
Initial version with multiple adapter support
Lance Parsons <lparsons@princeton.edu>
parents:
diff
changeset
|
33 $Id$ |
8b064ea16722
Initial version with multiple adapter support
Lance Parsons <lparsons@princeton.edu>
parents:
diff
changeset
|
34 """ |
8b064ea16722
Initial version with multiple adapter support
Lance Parsons <lparsons@princeton.edu>
parents:
diff
changeset
|
35 |
8b064ea16722
Initial version with multiple adapter support
Lance Parsons <lparsons@princeton.edu>
parents:
diff
changeset
|
36 import sys, os, traceback, optparse, shutil, subprocess, tempfile |
8b064ea16722
Initial version with multiple adapter support
Lance Parsons <lparsons@princeton.edu>
parents:
diff
changeset
|
37 import re |
8b064ea16722
Initial version with multiple adapter support
Lance Parsons <lparsons@princeton.edu>
parents:
diff
changeset
|
38 #from pexpect import run, spawn |
8b064ea16722
Initial version with multiple adapter support
Lance Parsons <lparsons@princeton.edu>
parents:
diff
changeset
|
39 |
8b064ea16722
Initial version with multiple adapter support
Lance Parsons <lparsons@princeton.edu>
parents:
diff
changeset
|
40 def stop_err( msg ): |
8b064ea16722
Initial version with multiple adapter support
Lance Parsons <lparsons@princeton.edu>
parents:
diff
changeset
|
41 sys.stderr.write( '%s\n' % msg ) |
8b064ea16722
Initial version with multiple adapter support
Lance Parsons <lparsons@princeton.edu>
parents:
diff
changeset
|
42 sys.exit() |
8b064ea16722
Initial version with multiple adapter support
Lance Parsons <lparsons@princeton.edu>
parents:
diff
changeset
|
43 |
8b064ea16722
Initial version with multiple adapter support
Lance Parsons <lparsons@princeton.edu>
parents:
diff
changeset
|
44 def main (): |
8b064ea16722
Initial version with multiple adapter support
Lance Parsons <lparsons@princeton.edu>
parents:
diff
changeset
|
45 |
8b064ea16722
Initial version with multiple adapter support
Lance Parsons <lparsons@princeton.edu>
parents:
diff
changeset
|
46 global options, args |
8b064ea16722
Initial version with multiple adapter support
Lance Parsons <lparsons@princeton.edu>
parents:
diff
changeset
|
47 # Setup Parameters |
8b064ea16722
Initial version with multiple adapter support
Lance Parsons <lparsons@princeton.edu>
parents:
diff
changeset
|
48 params = [] |
8b064ea16722
Initial version with multiple adapter support
Lance Parsons <lparsons@princeton.edu>
parents:
diff
changeset
|
49 if options.adapters != None: |
8b064ea16722
Initial version with multiple adapter support
Lance Parsons <lparsons@princeton.edu>
parents:
diff
changeset
|
50 params.append("-a %s" % " -a ".join(options.adapters)) |
8b064ea16722
Initial version with multiple adapter support
Lance Parsons <lparsons@princeton.edu>
parents:
diff
changeset
|
51 if options.anywhere_adapters != None: |
8b064ea16722
Initial version with multiple adapter support
Lance Parsons <lparsons@princeton.edu>
parents:
diff
changeset
|
52 params.append("-b %s" % " -b ".join(options.anywhere_adapters)) |
8b064ea16722
Initial version with multiple adapter support
Lance Parsons <lparsons@princeton.edu>
parents:
diff
changeset
|
53 if options.output_file != None: |
8b064ea16722
Initial version with multiple adapter support
Lance Parsons <lparsons@princeton.edu>
parents:
diff
changeset
|
54 params.append("-o %s" % options.output_file) |
8b064ea16722
Initial version with multiple adapter support
Lance Parsons <lparsons@princeton.edu>
parents:
diff
changeset
|
55 if options.error_rate != None: |
8b064ea16722
Initial version with multiple adapter support
Lance Parsons <lparsons@princeton.edu>
parents:
diff
changeset
|
56 params.append("-e %s" % options.error_rate) |
8b064ea16722
Initial version with multiple adapter support
Lance Parsons <lparsons@princeton.edu>
parents:
diff
changeset
|
57 if options.count != None: |
8b064ea16722
Initial version with multiple adapter support
Lance Parsons <lparsons@princeton.edu>
parents:
diff
changeset
|
58 params.append("-n %s" % options.count) |
8b064ea16722
Initial version with multiple adapter support
Lance Parsons <lparsons@princeton.edu>
parents:
diff
changeset
|
59 if options.overlap_length != None: |
8b064ea16722
Initial version with multiple adapter support
Lance Parsons <lparsons@princeton.edu>
parents:
diff
changeset
|
60 params.append("-O %s" % options.overlap_length) |
8b064ea16722
Initial version with multiple adapter support
Lance Parsons <lparsons@princeton.edu>
parents:
diff
changeset
|
61 if options.discard_trimmed: |
8b064ea16722
Initial version with multiple adapter support
Lance Parsons <lparsons@princeton.edu>
parents:
diff
changeset
|
62 params.append("--discard") |
8b064ea16722
Initial version with multiple adapter support
Lance Parsons <lparsons@princeton.edu>
parents:
diff
changeset
|
63 if options.minimum_length != None: |
8b064ea16722
Initial version with multiple adapter support
Lance Parsons <lparsons@princeton.edu>
parents:
diff
changeset
|
64 params.append("-m %s" % options.minimum_length) |
8b064ea16722
Initial version with multiple adapter support
Lance Parsons <lparsons@princeton.edu>
parents:
diff
changeset
|
65 if options.maximum_length != None: |
8b064ea16722
Initial version with multiple adapter support
Lance Parsons <lparsons@princeton.edu>
parents:
diff
changeset
|
66 params.append("-M %s" % options.maximum_length) |
8b064ea16722
Initial version with multiple adapter support
Lance Parsons <lparsons@princeton.edu>
parents:
diff
changeset
|
67 |
8b064ea16722
Initial version with multiple adapter support
Lance Parsons <lparsons@princeton.edu>
parents:
diff
changeset
|
68 # cutadapt relies on the extension to determine file format: .fasta or .fastq |
8b064ea16722
Initial version with multiple adapter support
Lance Parsons <lparsons@princeton.edu>
parents:
diff
changeset
|
69 input_name = '.'.join((options.input,options.format)) |
8b064ea16722
Initial version with multiple adapter support
Lance Parsons <lparsons@princeton.edu>
parents:
diff
changeset
|
70 # make temp directory |
8b064ea16722
Initial version with multiple adapter support
Lance Parsons <lparsons@princeton.edu>
parents:
diff
changeset
|
71 tmp_dir = tempfile.mkdtemp() |
8b064ea16722
Initial version with multiple adapter support
Lance Parsons <lparsons@princeton.edu>
parents:
diff
changeset
|
72 |
8b064ea16722
Initial version with multiple adapter support
Lance Parsons <lparsons@princeton.edu>
parents:
diff
changeset
|
73 try: |
8b064ea16722
Initial version with multiple adapter support
Lance Parsons <lparsons@princeton.edu>
parents:
diff
changeset
|
74 # make a link to the input file in the tmp_dir |
8b064ea16722
Initial version with multiple adapter support
Lance Parsons <lparsons@princeton.edu>
parents:
diff
changeset
|
75 input_file = os.path.join(tmp_dir,os.path.basename(input_name)) |
8b064ea16722
Initial version with multiple adapter support
Lance Parsons <lparsons@princeton.edu>
parents:
diff
changeset
|
76 os.symlink( options.input, input_file) |
8b064ea16722
Initial version with multiple adapter support
Lance Parsons <lparsons@princeton.edu>
parents:
diff
changeset
|
77 |
8b064ea16722
Initial version with multiple adapter support
Lance Parsons <lparsons@princeton.edu>
parents:
diff
changeset
|
78 # generate commandline |
8b064ea16722
Initial version with multiple adapter support
Lance Parsons <lparsons@princeton.edu>
parents:
diff
changeset
|
79 cmd = 'cutadapt %s %s' % (' '.join(params),input_file) |
8b064ea16722
Initial version with multiple adapter support
Lance Parsons <lparsons@princeton.edu>
parents:
diff
changeset
|
80 proc = subprocess.Popen( args=cmd, shell=True, cwd=tmp_dir, |
8b064ea16722
Initial version with multiple adapter support
Lance Parsons <lparsons@princeton.edu>
parents:
diff
changeset
|
81 stdout=subprocess.PIPE, |
8b064ea16722
Initial version with multiple adapter support
Lance Parsons <lparsons@princeton.edu>
parents:
diff
changeset
|
82 stderr=subprocess.PIPE) |
8b064ea16722
Initial version with multiple adapter support
Lance Parsons <lparsons@princeton.edu>
parents:
diff
changeset
|
83 (stdoutdata, stderrdata) = proc.communicate() |
8b064ea16722
Initial version with multiple adapter support
Lance Parsons <lparsons@princeton.edu>
parents:
diff
changeset
|
84 returncode = proc.returncode |
8b064ea16722
Initial version with multiple adapter support
Lance Parsons <lparsons@princeton.edu>
parents:
diff
changeset
|
85 if returncode != 0: |
8b064ea16722
Initial version with multiple adapter support
Lance Parsons <lparsons@princeton.edu>
parents:
diff
changeset
|
86 raise Exception, 'Execution of cutadapt failed.\n%s' % stderrdata |
8b064ea16722
Initial version with multiple adapter support
Lance Parsons <lparsons@princeton.edu>
parents:
diff
changeset
|
87 print stderrdata |
8b064ea16722
Initial version with multiple adapter support
Lance Parsons <lparsons@princeton.edu>
parents:
diff
changeset
|
88 |
8b064ea16722
Initial version with multiple adapter support
Lance Parsons <lparsons@princeton.edu>
parents:
diff
changeset
|
89 finally: |
8b064ea16722
Initial version with multiple adapter support
Lance Parsons <lparsons@princeton.edu>
parents:
diff
changeset
|
90 # clean up temp dir |
8b064ea16722
Initial version with multiple adapter support
Lance Parsons <lparsons@princeton.edu>
parents:
diff
changeset
|
91 if os.path.exists( input_name ): |
8b064ea16722
Initial version with multiple adapter support
Lance Parsons <lparsons@princeton.edu>
parents:
diff
changeset
|
92 os.remove( input_name ) |
8b064ea16722
Initial version with multiple adapter support
Lance Parsons <lparsons@princeton.edu>
parents:
diff
changeset
|
93 if os.path.exists( tmp_dir ): |
8b064ea16722
Initial version with multiple adapter support
Lance Parsons <lparsons@princeton.edu>
parents:
diff
changeset
|
94 shutil.rmtree( tmp_dir ) |
8b064ea16722
Initial version with multiple adapter support
Lance Parsons <lparsons@princeton.edu>
parents:
diff
changeset
|
95 |
8b064ea16722
Initial version with multiple adapter support
Lance Parsons <lparsons@princeton.edu>
parents:
diff
changeset
|
96 if __name__ == '__main__': |
8b064ea16722
Initial version with multiple adapter support
Lance Parsons <lparsons@princeton.edu>
parents:
diff
changeset
|
97 try: |
8b064ea16722
Initial version with multiple adapter support
Lance Parsons <lparsons@princeton.edu>
parents:
diff
changeset
|
98 parser = optparse.OptionParser(formatter=optparse.TitledHelpFormatter(), usage=globals()['__doc__'], version='$Id$') |
8b064ea16722
Initial version with multiple adapter support
Lance Parsons <lparsons@princeton.edu>
parents:
diff
changeset
|
99 parser.add_option( '-i', '--input', dest='input', help='The sequence input file' ) |
8b064ea16722
Initial version with multiple adapter support
Lance Parsons <lparsons@princeton.edu>
parents:
diff
changeset
|
100 parser.add_option( '-f', '--format', dest='format', default='fastq', |
8b064ea16722
Initial version with multiple adapter support
Lance Parsons <lparsons@princeton.edu>
parents:
diff
changeset
|
101 help='The sequence input file format (default: fastq)' ) |
8b064ea16722
Initial version with multiple adapter support
Lance Parsons <lparsons@princeton.edu>
parents:
diff
changeset
|
102 parser.add_option ('-a', '--adapter', action='append', dest='adapters', help='3\' adapter sequence(s)') |
8b064ea16722
Initial version with multiple adapter support
Lance Parsons <lparsons@princeton.edu>
parents:
diff
changeset
|
103 parser.add_option ('-b', '--anywhere', action='append', dest='anywhere_adapters', help='5\' or 3\' "anywhere" adapter sequence(s)') |
8b064ea16722
Initial version with multiple adapter support
Lance Parsons <lparsons@princeton.edu>
parents:
diff
changeset
|
104 parser.add_option ('-e', '--error-rate', dest='error_rate', help='Maximum allowed error rate') |
8b064ea16722
Initial version with multiple adapter support
Lance Parsons <lparsons@princeton.edu>
parents:
diff
changeset
|
105 parser.add_option ('-n', '--times', dest='count', help='Try to remove adapters COUNT times') |
8b064ea16722
Initial version with multiple adapter support
Lance Parsons <lparsons@princeton.edu>
parents:
diff
changeset
|
106 parser.add_option ('-O', '--overlap', dest='overlap_length', help='Minimum overlap length') |
8b064ea16722
Initial version with multiple adapter support
Lance Parsons <lparsons@princeton.edu>
parents:
diff
changeset
|
107 parser.add_option ('--discard', '--discard-trimmed', dest='discard_trimmed', action='store_true', default=False, help='Discard reads that contain the adapter') |
8b064ea16722
Initial version with multiple adapter support
Lance Parsons <lparsons@princeton.edu>
parents:
diff
changeset
|
108 parser.add_option ('-m', '--minimum-length', dest='minimum_length', help='Discard reads that are shorter than LENGTH') |
8b064ea16722
Initial version with multiple adapter support
Lance Parsons <lparsons@princeton.edu>
parents:
diff
changeset
|
109 parser.add_option ('-M', '--maximum-length', dest='maximum_length', help='Discard reads that are longer than LENGTH') |
8b064ea16722
Initial version with multiple adapter support
Lance Parsons <lparsons@princeton.edu>
parents:
diff
changeset
|
110 parser.add_option ('-o', '--output', dest='output_file', help='The modified sequences are written to the file') |
8b064ea16722
Initial version with multiple adapter support
Lance Parsons <lparsons@princeton.edu>
parents:
diff
changeset
|
111 (options, args) = parser.parse_args() |
8b064ea16722
Initial version with multiple adapter support
Lance Parsons <lparsons@princeton.edu>
parents:
diff
changeset
|
112 if options.input == None: |
8b064ea16722
Initial version with multiple adapter support
Lance Parsons <lparsons@princeton.edu>
parents:
diff
changeset
|
113 stop_err("Misssing option --input") |
8b064ea16722
Initial version with multiple adapter support
Lance Parsons <lparsons@princeton.edu>
parents:
diff
changeset
|
114 if options.output_file == None: |
8b064ea16722
Initial version with multiple adapter support
Lance Parsons <lparsons@princeton.edu>
parents:
diff
changeset
|
115 stop_err("Misssing option --output") |
8b064ea16722
Initial version with multiple adapter support
Lance Parsons <lparsons@princeton.edu>
parents:
diff
changeset
|
116 if not os.path.exists(options.input): |
8b064ea16722
Initial version with multiple adapter support
Lance Parsons <lparsons@princeton.edu>
parents:
diff
changeset
|
117 stop_err("Unable to read intput file: %s" % options.input) |
8b064ea16722
Initial version with multiple adapter support
Lance Parsons <lparsons@princeton.edu>
parents:
diff
changeset
|
118 #if len(args) < 1: |
8b064ea16722
Initial version with multiple adapter support
Lance Parsons <lparsons@princeton.edu>
parents:
diff
changeset
|
119 # parser.error ('missing argument') |
8b064ea16722
Initial version with multiple adapter support
Lance Parsons <lparsons@princeton.edu>
parents:
diff
changeset
|
120 main() |
8b064ea16722
Initial version with multiple adapter support
Lance Parsons <lparsons@princeton.edu>
parents:
diff
changeset
|
121 sys.exit(0) |
8b064ea16722
Initial version with multiple adapter support
Lance Parsons <lparsons@princeton.edu>
parents:
diff
changeset
|
122 except KeyboardInterrupt, e: # Ctrl-C |
8b064ea16722
Initial version with multiple adapter support
Lance Parsons <lparsons@princeton.edu>
parents:
diff
changeset
|
123 raise e |
8b064ea16722
Initial version with multiple adapter support
Lance Parsons <lparsons@princeton.edu>
parents:
diff
changeset
|
124 except SystemExit, e: # sys.exit() |
8b064ea16722
Initial version with multiple adapter support
Lance Parsons <lparsons@princeton.edu>
parents:
diff
changeset
|
125 raise e |
8b064ea16722
Initial version with multiple adapter support
Lance Parsons <lparsons@princeton.edu>
parents:
diff
changeset
|
126 except Exception, e: |
8b064ea16722
Initial version with multiple adapter support
Lance Parsons <lparsons@princeton.edu>
parents:
diff
changeset
|
127 print 'ERROR, UNEXPECTED EXCEPTION' |
8b064ea16722
Initial version with multiple adapter support
Lance Parsons <lparsons@princeton.edu>
parents:
diff
changeset
|
128 print str(e) |
8b064ea16722
Initial version with multiple adapter support
Lance Parsons <lparsons@princeton.edu>
parents:
diff
changeset
|
129 traceback.print_exc() |
8b064ea16722
Initial version with multiple adapter support
Lance Parsons <lparsons@princeton.edu>
parents:
diff
changeset
|
130 os._exit(1) |
8b064ea16722
Initial version with multiple adapter support
Lance Parsons <lparsons@princeton.edu>
parents:
diff
changeset
|
131 |