Mercurial > repos > crs4 > edena
changeset 1:cd6cc6d76708 draft
Simplify passing repeated params to Python script.
Add more info to help sections.
author | crs4 |
---|---|
date | Fri, 18 Oct 2013 14:09:11 -0400 |
parents | 60609a9cef3b |
children | b8c6a38530eb |
files | edena_ass_wrapper.py edena_ass_wrapper.xml edena_ovl_wrapper.py edena_ovl_wrapper.xml |
diffstat | 4 files changed, 55 insertions(+), 113 deletions(-) [+] |
line wrap: on
line diff
--- a/edena_ass_wrapper.py Mon Sep 09 05:44:31 2013 -0400 +++ b/edena_ass_wrapper.py Fri Oct 18 14:09:11 2013 -0400 @@ -32,37 +32,16 @@ (options, args) = parser.parse_args() if len(args) > 0: parser.error('Wrong number of arguments') - + # build Edena (assembling) command to be executed ovl_input = '-e %s' % (options.ovl_input) - if options.overlapCutoff is not None: - overlapCutoff = '-m %d' % (options.overlapCutoff) - else: - overlapCutoff = '' - if options.cc: - cc = '-cc yes' - else: - cc = '-cc no' - if options.discardNonUsable: - discardNonUsable = '-discardNonUsable yes' - else: - discardNonUsable = '-discardNonUsable no' - if options.minContigSize is not None: - minContigSize = '-c %d' % (options.minContigSize) - else: - minContigSize = '' - if options.minCoverage is not None: - minCoverage = '-minCoverage %s' % (options.minCoverage) - else: - minCoverage = '' - if options.trim is not None: - trim = '-trim %d' % (options.trim) - else: - trim = '' - if options.peHorizon is not None: - peHorizon = '-peHorizon %d' % (options.peHorizon) - else: - peHorizon = '' + overlapCutoff = '-m %d' % (options.overlapCutoff) if options.overlapCutoff is not None else '' + cc = '-cc yes' if options.cc else '-cc no' + discardNonUsable = '-discardNonUsable yes' if options.discardNonUsable else '-discardNonUsable no' + minContigSize = '-c %d' % (options.minContigSize) if options.minContigSize is not None else '' + minCoverage = '-minCoverage %s' % (options.minCoverage) if options.minCoverage is not None else '' + trim = '-trim %d' % (options.trim) if options.trim is not None else '' + peHorizon = '-peHorizon %d' % (options.peHorizon) if options.peHorizon is not None else '' covStats = options.covStats out_contigs_cov = options.out_contigs_cov out_contigs_fasta = options.out_contigs_fasta @@ -71,20 +50,16 @@ out_nodesInfo = options.out_nodesInfo out_nodesPosition = options.out_nodesPosition logfile = options.logfile - + # Build Edena (assembling) command - cmd1 = '%s %s %s %s %s %s %s %s' % (ovl_input, overlapCutoff, cc, discardNonUsable, minContigSize, minCoverage, trim, peHorizon) - cmd2 = 'edena %s' % ( cmd1 ) - print '\nEdena (assembling) command to be executed: \n %s' % ( cmd2 ) - + cmd = 'edena %s %s %s %s %s %s %s %s' % (ovl_input, overlapCutoff, cc, discardNonUsable, minContigSize, minCoverage, trim, peHorizon) + print '\nEdena (assembling) command to be executed:\n %s' % (cmd) + # Execution of Edena print 'Executing Edena (assembling)...' - if logfile: - log = open(logfile, 'w') - else: - log = sys.stdout + log = open(logfile, 'w') if logfile else sys.stdout try: - subprocess.check_call(cmd2, stdout=log, stderr=subprocess.STDOUT, shell=True) # need to redirect stderr because edena writes some logging info there (e.g. "Condensing overlaps graph...") + subprocess.check_call(cmd, stdout=log, stderr=subprocess.STDOUT, shell=True) # need to redirect stderr because edena writes some logging info there (e.g. "Condensing overlaps graph...") finally: if log != sys.stdout: log.close()
--- a/edena_ass_wrapper.xml Mon Sep 09 05:44:31 2013 -0400 +++ b/edena_ass_wrapper.xml Fri Oct 18 14:09:11 2013 -0400 @@ -66,7 +66,11 @@ <help> **What it does** -The key parameter for this mode is the overlaps size cutoff (option –m). By default it is set to half of the reads length, which is quite conservative. If your sequencing project is well covered (>50-100x) you may try increasing a bit this value. The minCoverage is an important parameter which is automatically determined. You may check this value in the program output and possibly override it. +Edena is an overlaps graph based short reads assembler and is suited to Illumina GA reads. An assembly with Edena is a two step process: overlapping and assembling. + +In the assembling step, the overlapping file (produced in the previous step) is provided to the program, as well as some assembly parameters. A set of contigs in FASTA format is outputted. The purpose of having a two step process is that the overlapping file is computed only once and can then be used to produce assemblies with different parameters. + +The key parameter for this step is the overlaps size cutoff (option –m). By default it is set to half of the reads length, which is quite conservative. If your sequencing project is well covered (>50-100x) you may try increasing a bit this value. The minCoverage is an important parameter which is automatically determined. You may check this value in the program output and possibly override it. **License and citation**
--- a/edena_ovl_wrapper.py Mon Sep 09 05:44:31 2013 -0400 +++ b/edena_ovl_wrapper.py Fri Oct 18 14:09:11 2013 -0400 @@ -13,11 +13,11 @@ # load arguments print 'Parsing Edena (overlapping) input options...' parser = optparse.OptionParser() - parser.add_option('--unpaired_input', dest='unpaired_input', help='') - parser.add_option('--dr_pair_1', dest='dr_pair_1', help='') - parser.add_option('--dr_pair_2', dest='dr_pair_2', help='') - parser.add_option('--rd_pair_1', dest='rd_pair_1', help='') - parser.add_option('--rd_pair_2', dest='rd_pair_2', help='') + parser.add_option('--unpaired_input', action='append', dest='unpaired_input', help='') + parser.add_option('--dr_pair_1', action='append', dest='dr_pair_1', help='') + parser.add_option('--dr_pair_2', action='append', dest='dr_pair_2', help='') + parser.add_option('--rd_pair_1', action='append', dest='rd_pair_1', help='') + parser.add_option('--rd_pair_2', action='append', dest='rd_pair_2', help='') parser.add_option('--nThreads', dest='nThreads', type='int', help='') parser.add_option('--minOlap', dest='minOlap', type='int', help='') parser.add_option('--readsTruncation', dest='readsTruncation', type='int', help='') @@ -26,71 +26,54 @@ (options, args) = parser.parse_args() if len(args) > 0: parser.error('Wrong number of arguments') - + # build Edena (overlapping) command to be executed # unpaired input(s) if options.unpaired_input: - unpaired_inputs = options.unpaired_input.split('+')[0:-1] unpaired_input = '-r' - for item in unpaired_inputs: + for item in options.unpaired_input: unpaired_input += ' %s' % (item) else: unpaired_input = '' # direct-reverse paired-end files if options.dr_pair_1 and options.dr_pair_2: - dr_pairs_1 = options.dr_pair_1.split('+')[0:-1] - dr_pairs_2 = options.dr_pair_2.split('+')[0:-1] dr_pairs = '-DRpairs' - for i in xrange(len(dr_pairs_1)): - dr_pairs += ' %s %s' % (dr_pairs_1[i], dr_pairs_2[i]) + for i in range(len(options.dr_pair_1)): + dr_pairs += ' %s %s' % (options.dr_pair_1[i], options.dr_pair_2[i]) else: dr_pairs = '' # reverse-direct paired-end files if options.rd_pair_1 and options.rd_pair_2: - rd_pairs_1 = options.rd_pair_1.split('+')[0:-1] - rd_pairs_2 = options.rd_pair_2.split('+')[0:-1] rd_pairs = '-RDpairs' - for i in xrange(len(rd_pairs_1)): - rd_pairs += ' %s %s' % (rd_pairs_1[i], rd_pairs_2[i]) + for i in range(len(options.rd_pair_1)): + rd_pairs += ' %s %s' % (options.rd_pair_1[i], options.rd_pair_2[i]) else: rd_pairs = '' # nThreads - if options.nThreads is not None: - nThreads = '-nThreads %d' % (options.nThreads) - else: - nThreads = '' + nThreads = '-nThreads %d' % (options.nThreads) if options.nThreads is not None else '' # minimum overlap - if options.minOlap is not None: - minOlap = '-M %d' % (options.minOlap) - else: - minOlap = '' + minOlap = '-M %d' % (options.minOlap) if options.minOlap is not None else '' # 3' end reads truncation - if options.readsTruncation is not None: - readsTruncation = '-t %d' % (options.readsTruncation) - else: - readsTruncation = '' + readsTruncation = '-t %d' % (options.readsTruncation) if options.readsTruncation is not None else '' # output file(s) output = options.output logfile = options.logfile - + # Build Edena (overlapping) command cmd = 'edena %s %s %s %s %s %s -p galaxy_output' % (unpaired_input, dr_pairs, rd_pairs, nThreads, minOlap, readsTruncation) - print '\nEdena (overlapping) command to be executed: \n %s' % ( cmd ) - + print '\nEdena (overlapping) command to be executed:\n %s' % (cmd) + # Execution of Edena print 'Executing Edena (overlapping)...' - if logfile: - log = open(logfile, 'w') - else: - log = sys.stdout + log = open(logfile, 'w') if logfile else sys.stdout try: subprocess.check_call(cmd, stdout=log, stderr=subprocess.STDOUT, shell=True) # need to redirect stderr because edena writes some logging info there (e.g. "Computing overlaps >=30...") finally: if log != sys.stdout: log.close() print 'Edena (overlapping) executed!' - - shutil.move( "galaxy_output.ovl", output) + + shutil.move('galaxy_output.ovl', output) if __name__ == "__main__":
--- a/edena_ovl_wrapper.xml Mon Sep 09 05:44:31 2013 -0400 +++ b/edena_ovl_wrapper.xml Fri Oct 18 14:09:11 2013 -0400 @@ -8,44 +8,18 @@ edena_ovl_wrapper.py \${EDENA_SITE_OPTIONS:---nThreads 2} #if $input_selection.input == "unpaired_file" - #for $i, $unpaired_file in enumerate( $input_selection.unpaired_input ): - #if $i == 0 - #echo "--unpaired_input=" - #end if - #echo $unpaired_file.unpaired_file - #echo '+' + #for $ui in $input_selection.unpaired_input + --unpaired_input=${ui.unpaired_file} #end for #elif $input_selection.input == "dr_pairs" - #for $i, $dr_pair_1 in enumerate( $input_selection.dr_pairs_input ): - #if $i == 0 - #echo "--dr_pair_1=" - #end if - #echo $dr_pair_1.dr_pair_1 - #echo '+' - #end for - #echo ' ' - #for $i, $dr_pair_2 in enumerate( $input_selection.dr_pairs_input ): - #if $i == 0 - #echo "--dr_pair_2=" - #end if - #echo $dr_pair_2.dr_pair_2 - #echo '+' + #for $dpi in $input_selection.dr_pairs_input + --dr_pair_1=${dpi.dr_pair_1} + --dr_pair_2=${dpi.dr_pair_2} #end for #elif $input_selection.input == "rd_pairs" - #for $i, $rd_pair_1 in enumerate( $input_selection.rd_pairs_input ): - #if $i == 0 - #echo "--rd_pair_1=" - #end if - #echo $rd_pair_1.rd_pair_1 - #echo '+' - #end for - #echo ' ' - #for $i, $rd_pair_2 in enumerate( $input_selection.rd_pairs_input ): - #if $i == 0 - #echo "--rd_pair_2=" - #end if - #echo $rd_pair_2.rd_pair_2 - #echo '+' + #for $rpi in $input_selection.rd_pairs_input + --rd_pair_1=${rpi.rd_pair_1} + --rd_pair_2=${rpi.rd_pair_2} #end for #end if #if str($minOlap) @@ -61,7 +35,7 @@ <inputs> <conditional name="input_selection"> <param name="input" type="select" label="Select input type"> - <option value="unpaired_file" selected="True">Unpaired files</option> + <option value="unpaired_file">Unpaired files</option> <option value="dr_pairs">Direct-reverse paired-end files</option> <option value="rd_pairs">Reverse-direct paired-end files</option> </param> @@ -104,7 +78,13 @@ <help> **What it does** -Edena can accept both unpaired and paired files, FASTQ and FASTA format. Note that for technical reasons, all reads are required to be of the same length. You can however provide the program with different files containing different reads length. In such case, Edena will trim the 3’ ends of the longer reads so that they fit the shorter length. It is however required that reads within each individual file are of the same length (as Illumina GA reads are). By default all overlaps with a minimum size corresponding to half of the reads length are computed. This is quite conservative. Provided enough coverage, this value can be increased (option -M) to reduce the memory requirements. For reads longer than 100bp, you may consider the reads truncation option, which could help in discarding 3’ base calling errors. +Edena is an overlaps graph based short reads assembler and is suited to Illumina GA reads. An assembly with Edena is a two step process: overlapping and assembling. + +In the overlapping step, the reads files are provided to the program which computes the transitively reduced overlaps graph. This structure is then stored together with the sequence reads in the overlapping file. + +Edena can accept both unpaired and paired files, FASTQ and FASTA format. Note that for technical reasons, all reads are required to be of the same length. You can however provide the program with different files containing different reads length. In such case, Edena will trim the 3’ ends of the longer reads so that they fit the shorter length. It is however required that reads within each individual file are of the same length (as Illumina GA reads are). By default all overlaps with a minimum size corresponding to half of the reads length are computed. This is quite conservative. Provided enough coverage, this value can be increased (option -M) to reduce the memory requirements. + +For reads longer than 100bp, you may consider the reads truncation option, which could help in discarding 3’ base calling errors. **License and citation**