annotate edena_ass_wrapper.xml @ 2:b8c6a38530eb draft default tip

Support Edena v. 3.131028 (new <version_command>, official overlapping log file, covStats output file removed, -lph and -sph options instead of -peHorizon). Use $GALAXY_SLOTS instead of $EDENA _SITE_OPTIONS. Directly call edena, remove edena_ovl_wrapper.py and edena_ass_wrapper.py . Discard stderr instead of redirecting to stdout. Do not redirect stdout to logfile. Add readme.rst .
author crs4
date Fri, 31 Jan 2014 12:08:21 -0500
parents cd6cc6d76708
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
2
b8c6a38530eb Support Edena v. 3.131028 (new <version_command>, official overlapping log file, covStats output file removed, -lph and -sph options instead of -peHorizon). Use $GALAXY_SLOTS instead of $EDENA
crs4
parents: 1
diff changeset
1 <tool id="edena_ass_wrapper" name="Edena (assembling)" version="0.3">
0
60609a9cef3b Uploaded
crs4
parents:
diff changeset
2 <description></description>
60609a9cef3b Uploaded
crs4
parents:
diff changeset
3 <requirements>
2
b8c6a38530eb Support Edena v. 3.131028 (new <version_command>, official overlapping log file, covStats output file removed, -lph and -sph options instead of -peHorizon). Use $GALAXY_SLOTS instead of $EDENA
crs4
parents: 1
diff changeset
4 <requirement type="package" version="3.131028">edena</requirement>
0
60609a9cef3b Uploaded
crs4
parents:
diff changeset
5 </requirements>
2
b8c6a38530eb Support Edena v. 3.131028 (new <version_command>, official overlapping log file, covStats output file removed, -lph and -sph options instead of -peHorizon). Use $GALAXY_SLOTS instead of $EDENA
crs4
parents: 1
diff changeset
6 <version_command>edena | head -n 1</version_command>
b8c6a38530eb Support Edena v. 3.131028 (new <version_command>, official overlapping log file, covStats output file removed, -lph and -sph options instead of -peHorizon). Use $GALAXY_SLOTS instead of $EDENA
crs4
parents: 1
diff changeset
7 <command>
b8c6a38530eb Support Edena v. 3.131028 (new <version_command>, official overlapping log file, covStats output file removed, -lph and -sph options instead of -peHorizon). Use $GALAXY_SLOTS instead of $EDENA
crs4
parents: 1
diff changeset
8 edena -e $ovl_input
0
60609a9cef3b Uploaded
crs4
parents:
diff changeset
9 #if str($overlapCutoff)
2
b8c6a38530eb Support Edena v. 3.131028 (new <version_command>, official overlapping log file, covStats output file removed, -lph and -sph options instead of -peHorizon). Use $GALAXY_SLOTS instead of $EDENA
crs4
parents: 1
diff changeset
10 -m $overlapCutoff
0
60609a9cef3b Uploaded
crs4
parents:
diff changeset
11 #end if
60609a9cef3b Uploaded
crs4
parents:
diff changeset
12 #if $cc
2
b8c6a38530eb Support Edena v. 3.131028 (new <version_command>, official overlapping log file, covStats output file removed, -lph and -sph options instead of -peHorizon). Use $GALAXY_SLOTS instead of $EDENA
crs4
parents: 1
diff changeset
13 -cc yes
b8c6a38530eb Support Edena v. 3.131028 (new <version_command>, official overlapping log file, covStats output file removed, -lph and -sph options instead of -peHorizon). Use $GALAXY_SLOTS instead of $EDENA
crs4
parents: 1
diff changeset
14 #else
b8c6a38530eb Support Edena v. 3.131028 (new <version_command>, official overlapping log file, covStats output file removed, -lph and -sph options instead of -peHorizon). Use $GALAXY_SLOTS instead of $EDENA
crs4
parents: 1
diff changeset
15 -cc no
0
60609a9cef3b Uploaded
crs4
parents:
diff changeset
16 #end if
60609a9cef3b Uploaded
crs4
parents:
diff changeset
17 #if $discardNonUsable
2
b8c6a38530eb Support Edena v. 3.131028 (new <version_command>, official overlapping log file, covStats output file removed, -lph and -sph options instead of -peHorizon). Use $GALAXY_SLOTS instead of $EDENA
crs4
parents: 1
diff changeset
18 -discardNonUsable yes
b8c6a38530eb Support Edena v. 3.131028 (new <version_command>, official overlapping log file, covStats output file removed, -lph and -sph options instead of -peHorizon). Use $GALAXY_SLOTS instead of $EDENA
crs4
parents: 1
diff changeset
19 #else
b8c6a38530eb Support Edena v. 3.131028 (new <version_command>, official overlapping log file, covStats output file removed, -lph and -sph options instead of -peHorizon). Use $GALAXY_SLOTS instead of $EDENA
crs4
parents: 1
diff changeset
20 -discardNonUsable no
0
60609a9cef3b Uploaded
crs4
parents:
diff changeset
21 #end if
60609a9cef3b Uploaded
crs4
parents:
diff changeset
22 #if str($minContigSize)
2
b8c6a38530eb Support Edena v. 3.131028 (new <version_command>, official overlapping log file, covStats output file removed, -lph and -sph options instead of -peHorizon). Use $GALAXY_SLOTS instead of $EDENA
crs4
parents: 1
diff changeset
23 -c $minContigSize
0
60609a9cef3b Uploaded
crs4
parents:
diff changeset
24 #end if
60609a9cef3b Uploaded
crs4
parents:
diff changeset
25 #if str($minCoverage)
2
b8c6a38530eb Support Edena v. 3.131028 (new <version_command>, official overlapping log file, covStats output file removed, -lph and -sph options instead of -peHorizon). Use $GALAXY_SLOTS instead of $EDENA
crs4
parents: 1
diff changeset
26 -minCoverage $minCoverage
0
60609a9cef3b Uploaded
crs4
parents:
diff changeset
27 #end if
60609a9cef3b Uploaded
crs4
parents:
diff changeset
28 #if str($trim)
2
b8c6a38530eb Support Edena v. 3.131028 (new <version_command>, official overlapping log file, covStats output file removed, -lph and -sph options instead of -peHorizon). Use $GALAXY_SLOTS instead of $EDENA
crs4
parents: 1
diff changeset
29 -trim $trim
b8c6a38530eb Support Edena v. 3.131028 (new <version_command>, official overlapping log file, covStats output file removed, -lph and -sph options instead of -peHorizon). Use $GALAXY_SLOTS instead of $EDENA
crs4
parents: 1
diff changeset
30 #end if
b8c6a38530eb Support Edena v. 3.131028 (new <version_command>, official overlapping log file, covStats output file removed, -lph and -sph options instead of -peHorizon). Use $GALAXY_SLOTS instead of $EDENA
crs4
parents: 1
diff changeset
31 #if str($sph)
b8c6a38530eb Support Edena v. 3.131028 (new <version_command>, official overlapping log file, covStats output file removed, -lph and -sph options instead of -peHorizon). Use $GALAXY_SLOTS instead of $EDENA
crs4
parents: 1
diff changeset
32 -sph $sph
0
60609a9cef3b Uploaded
crs4
parents:
diff changeset
33 #end if
2
b8c6a38530eb Support Edena v. 3.131028 (new <version_command>, official overlapping log file, covStats output file removed, -lph and -sph options instead of -peHorizon). Use $GALAXY_SLOTS instead of $EDENA
crs4
parents: 1
diff changeset
34 #if str($lph)
b8c6a38530eb Support Edena v. 3.131028 (new <version_command>, official overlapping log file, covStats output file removed, -lph and -sph options instead of -peHorizon). Use $GALAXY_SLOTS instead of $EDENA
crs4
parents: 1
diff changeset
35 -lph $lph
0
60609a9cef3b Uploaded
crs4
parents:
diff changeset
36 #end if
2
b8c6a38530eb Support Edena v. 3.131028 (new <version_command>, official overlapping log file, covStats output file removed, -lph and -sph options instead of -peHorizon). Use $GALAXY_SLOTS instead of $EDENA
crs4
parents: 1
diff changeset
37 2&gt;/dev/null ## need to discard stderr because edena writes some progress info there (e.g. "Condensing overlaps graph...")
0
60609a9cef3b Uploaded
crs4
parents:
diff changeset
38 </command>
60609a9cef3b Uploaded
crs4
parents:
diff changeset
39
60609a9cef3b Uploaded
crs4
parents:
diff changeset
40 <inputs>
2
b8c6a38530eb Support Edena v. 3.131028 (new <version_command>, official overlapping log file, covStats output file removed, -lph and -sph options instead of -peHorizon). Use $GALAXY_SLOTS instead of $EDENA
crs4
parents: 1
diff changeset
41 <param name="ovl_input" type="data" format="ovl" label="Edena overlap (.ovl) file (-e)" help="Specify here the Edena “.ovl” file obtained from the overlapping step" />
0
60609a9cef3b Uploaded
crs4
parents:
diff changeset
42
60609a9cef3b Uploaded
crs4
parents:
diff changeset
43 <param name="overlapCutoff" type="integer" value="" optional="true" label="Overlap cutoff (-m)" help="The overlap cutoff is by default set to half of the reads length L (see the log output by the overlapping step to identify it). It is however still worth trying to increase this setting since it can greatly simplify highly connected overlaps graphs, and thus speed up the assembly. If one step during the assembly hangs, increasing the overlap cutoff is the first thing to do." />
60609a9cef3b Uploaded
crs4
parents:
diff changeset
44
2
b8c6a38530eb Support Edena v. 3.131028 (new <version_command>, official overlapping log file, covStats output file removed, -lph and -sph options instead of -peHorizon). Use $GALAXY_SLOTS instead of $EDENA
crs4
parents: 1
diff changeset
45 <param name="cc" type="boolean" checked="true" label="Contextual cleaning of spurious edges (-cc)" help="Contextual cleaning is a procedure that efficiently identifies and removes false positive edges, improving thus the assembly. This procedure can be seen as a dynamic overlap cutoff on the overlaps graph. It is possible however for this step to be slow on ultra-high covered sequencing data. In such cases, try to increase the overlap cutoff value, or to simply disable this option." />
0
60609a9cef3b Uploaded
crs4
parents:
diff changeset
46
2
b8c6a38530eb Support Edena v. 3.131028 (new <version_command>, official overlapping log file, covStats output file removed, -lph and -sph options instead of -peHorizon). Use $GALAXY_SLOTS instead of $EDENA
crs4
parents: 1
diff changeset
47 <param name="discardNonUsable" type="boolean" checked="true" label="Discard non usable nodes (-discardNonUsable)" help="This procedure discards orphan nodes smaller than 1.5*readLength." />
0
60609a9cef3b Uploaded
crs4
parents:
diff changeset
48
60609a9cef3b Uploaded
crs4
parents:
diff changeset
49 <param name="minContigSize" type="integer" value="" optional="true" label="Minimum size of the contigs to output (-c)" help="If not specified, this value is set to 1.5*readLength." />
60609a9cef3b Uploaded
crs4
parents:
diff changeset
50
60609a9cef3b Uploaded
crs4
parents:
diff changeset
51 <param name="minCoverage" type="float" value="" optional="true" label="Minimum required coverage for the contigs (-minCoverage)" help="If not specified, this value is automatically determined from the nodes coverage distribution. This estimation however supposes a uniform coverage. It could be worth overriding this parameter in some cases, i.e. with transcriptome data, or a mix of PCR product assemblies." />
60609a9cef3b Uploaded
crs4
parents:
diff changeset
52
2
b8c6a38530eb Support Edena v. 3.131028 (new <version_command>, official overlapping log file, covStats output file removed, -lph and -sph options instead of -peHorizon). Use $GALAXY_SLOTS instead of $EDENA
crs4
parents: 1
diff changeset
53 <param name="trim" type="integer" value="4" optional="true" label="Coverage cutoff for contigs ends (-trim)" help="Contig interruptions are caused either because of a non-resolved ambiguity, or because of a lack of overlapping reads. In the latter case, the contig end may be inaccurate. This option will trim a few bases from these ends until a minimum coverage is reached. By default, this value is set to 4. To disable contigs ends trimming, set this value to 1." />
b8c6a38530eb Support Edena v. 3.131028 (new <version_command>, official overlapping log file, covStats output file removed, -lph and -sph options instead of -peHorizon). Use $GALAXY_SLOTS instead of $EDENA
crs4
parents: 1
diff changeset
54 <param name="sph" type="integer" value="1000" optional="true" label="Maximum search distance for paired-end (forward-reverse) sampling (-sph)" help="Edena samples the overlaps graph to accurately determine the paired distance distribution. This parameter specifies the maximum distance that is searched during this sampling. This value has to be set to at least 2X the expected size of the longest paired-end library." />
b8c6a38530eb Support Edena v. 3.131028 (new <version_command>, official overlapping log file, covStats output file removed, -lph and -sph options instead of -peHorizon). Use $GALAXY_SLOTS instead of $EDENA
crs4
parents: 1
diff changeset
55 <param name="lph" type="integer" value="15000" optional="true" label="Maximum search distance for mate-pair (reverse-forward) sampling (-lph)" help="Edena samples the overlaps graph to accurately determine the paired distance distribution. This parameter specifies the maximum distance that is searched during this sampling. This value has to be set to at least 2X the expected size of the longest mate-pair library." />
0
60609a9cef3b Uploaded
crs4
parents:
diff changeset
56 </inputs>
60609a9cef3b Uploaded
crs4
parents:
diff changeset
57
60609a9cef3b Uploaded
crs4
parents:
diff changeset
58 <outputs>
2
b8c6a38530eb Support Edena v. 3.131028 (new <version_command>, official overlapping log file, covStats output file removed, -lph and -sph options instead of -peHorizon). Use $GALAXY_SLOTS instead of $EDENA
crs4
parents: 1
diff changeset
59 <data name="out_contigs_cov" format="txt" label="${tool.name} on ${on_string}: ContigsCov" from_work_dir="out_contigs.cov" />
b8c6a38530eb Support Edena v. 3.131028 (new <version_command>, official overlapping log file, covStats output file removed, -lph and -sph options instead of -peHorizon). Use $GALAXY_SLOTS instead of $EDENA
crs4
parents: 1
diff changeset
60 <data name="out_contigs_fasta" format="fasta" label="${tool.name} on ${on_string}: ContigsFasta" from_work_dir="out_contigs.fasta" />
b8c6a38530eb Support Edena v. 3.131028 (new <version_command>, official overlapping log file, covStats output file removed, -lph and -sph options instead of -peHorizon). Use $GALAXY_SLOTS instead of $EDENA
crs4
parents: 1
diff changeset
61 <data name="out_contigs_lay" format="txt" label="${tool.name} on ${on_string}: ContigsLay" from_work_dir="out_contigs.lay" />
b8c6a38530eb Support Edena v. 3.131028 (new <version_command>, official overlapping log file, covStats output file removed, -lph and -sph options instead of -peHorizon). Use $GALAXY_SLOTS instead of $EDENA
crs4
parents: 1
diff changeset
62 <data name="out_log_txt" format="txt" label="${tool.name} on ${on_string}: log" from_work_dir="out_assembling.log" />
b8c6a38530eb Support Edena v. 3.131028 (new <version_command>, official overlapping log file, covStats output file removed, -lph and -sph options instead of -peHorizon). Use $GALAXY_SLOTS instead of $EDENA
crs4
parents: 1
diff changeset
63 <data name="out_nodesInfo" format="txt" label="${tool.name} on ${on_string}: nodes info" from_work_dir="out_nodesInfo" />
b8c6a38530eb Support Edena v. 3.131028 (new <version_command>, official overlapping log file, covStats output file removed, -lph and -sph options instead of -peHorizon). Use $GALAXY_SLOTS instead of $EDENA
crs4
parents: 1
diff changeset
64 <data name="out_nodesPosition" format="txt" label="${tool.name} on ${on_string}: nodes position" from_work_dir="out_nodesPosition" />
0
60609a9cef3b Uploaded
crs4
parents:
diff changeset
65 </outputs>
60609a9cef3b Uploaded
crs4
parents:
diff changeset
66
60609a9cef3b Uploaded
crs4
parents:
diff changeset
67 <tests>
60609a9cef3b Uploaded
crs4
parents:
diff changeset
68
60609a9cef3b Uploaded
crs4
parents:
diff changeset
69 </tests>
60609a9cef3b Uploaded
crs4
parents:
diff changeset
70 <help>
60609a9cef3b Uploaded
crs4
parents:
diff changeset
71 **What it does**
60609a9cef3b Uploaded
crs4
parents:
diff changeset
72
1
cd6cc6d76708 Simplify passing repeated params to Python script.
crs4
parents: 0
diff changeset
73 Edena is an overlaps graph based short reads assembler and is suited to Illumina GA reads. An assembly with Edena is a two step process: overlapping and assembling.
cd6cc6d76708 Simplify passing repeated params to Python script.
crs4
parents: 0
diff changeset
74
cd6cc6d76708 Simplify passing repeated params to Python script.
crs4
parents: 0
diff changeset
75 In the assembling step, the overlapping file (produced in the previous step) is provided to the program, as well as some assembly parameters. A set of contigs in FASTA format is outputted. The purpose of having a two step process is that the overlapping file is computed only once and can then be used to produce assemblies with different parameters.
cd6cc6d76708 Simplify passing repeated params to Python script.
crs4
parents: 0
diff changeset
76
cd6cc6d76708 Simplify passing repeated params to Python script.
crs4
parents: 0
diff changeset
77 The key parameter for this step is the overlaps size cutoff (option –m). By default it is set to half of the reads length, which is quite conservative. If your sequencing project is well covered (>50-100x) you may try increasing a bit this value. The minCoverage is an important parameter which is automatically determined. You may check this value in the program output and possibly override it.
0
60609a9cef3b Uploaded
crs4
parents:
diff changeset
78
60609a9cef3b Uploaded
crs4
parents:
diff changeset
79 **License and citation**
60609a9cef3b Uploaded
crs4
parents:
diff changeset
80
60609a9cef3b Uploaded
crs4
parents:
diff changeset
81 This Galaxy tool is Copyright © 2013 `CRS4 Srl.`_ and is released under the `MIT license`_.
60609a9cef3b Uploaded
crs4
parents:
diff changeset
82
60609a9cef3b Uploaded
crs4
parents:
diff changeset
83 .. _CRS4 Srl.: http://www.crs4.it/
60609a9cef3b Uploaded
crs4
parents:
diff changeset
84 .. _MIT license: http://opensource.org/licenses/MIT
60609a9cef3b Uploaded
crs4
parents:
diff changeset
85
60609a9cef3b Uploaded
crs4
parents:
diff changeset
86 If you use this tool in Galaxy, please cite |Cuccuru2013|_.
60609a9cef3b Uploaded
crs4
parents:
diff changeset
87
60609a9cef3b Uploaded
crs4
parents:
diff changeset
88 .. |Cuccuru2013| replace:: Cuccuru, G., Orsini, M., Pinna, A., Sbardellati, A., Soranzo, N., Travaglione, A., Uva, P., Zanetti, G., Fotia, G. (2013) Orione, a web-based framework for NGS analysis in microbiology. *Submitted*
60609a9cef3b Uploaded
crs4
parents:
diff changeset
89 .. _Cuccuru2013: http://orione.crs4.it/
60609a9cef3b Uploaded
crs4
parents:
diff changeset
90
60609a9cef3b Uploaded
crs4
parents:
diff changeset
91 This tool uses `Edena`_, which is licensed separately. Please cite |Hernandez2008|_.
60609a9cef3b Uploaded
crs4
parents:
diff changeset
92
60609a9cef3b Uploaded
crs4
parents:
diff changeset
93 .. _Edena: http://www.genomic.ch/edena.php
60609a9cef3b Uploaded
crs4
parents:
diff changeset
94 .. |Hernandez2008| replace:: Hernandez, D., *et al.* (2008) De novo bacterial genome sequencing: Millions of very short reads assembled on a desktop computer. *Genome Res.* 18(5), 802-809
60609a9cef3b Uploaded
crs4
parents:
diff changeset
95 .. _Hernandez2008: http://genome.cshlp.org/content/18/5/802
60609a9cef3b Uploaded
crs4
parents:
diff changeset
96 </help>
60609a9cef3b Uploaded
crs4
parents:
diff changeset
97 </tool>