view edena_ass_wrapper.xml @ 1:cd6cc6d76708 draft

Simplify passing repeated params to Python script. Add more info to help sections.
author crs4
date Fri, 18 Oct 2013 14:09:11 -0400
parents 60609a9cef3b
children b8c6a38530eb
line wrap: on
line source

<tool id="edena_ass_wrapper" name="Edena (assembling)" version="0.2.1">
  <description></description>
  <requirements>
    <requirement type="package" version="3.130110">edena</requirement>
  </requirements>
  <version_command>edena -v</version_command>
  <command interpreter="python">
    edena_ass_wrapper.py --ovl_input=$ovl_input
    #if str($overlapCutoff)
      --overlapCutoff=$overlapCutoff
    #end if
    #if $cc
      --cc
    #end if
    #if $discardNonUsable
      --discardNonUsable
    #end if
    #if str($minContigSize)
      --minContigSize=$minContigSize
    #end if
    #if str($minCoverage)
      --minCoverage=$minCoverage
    #end if
    #if str($trim)
      --trim=$trim
    #end if
    #if str($peHorizon)
      --peHorizon=$peHorizon
    #end if
    --covStats=$covStats --out_contigs_cov=$out_contigs_cov --out_contigs_fasta=$out_contigs_fasta --out_contigs_lay=$out_contigs_lay --out_log_txt=$out_log_txt --out_nodesInfo=$out_nodesInfo --out_nodesPosition=$out_nodesPosition
    --logfile=$logfile
  </command>

  <inputs>
    <param name="ovl_input" type="data" format="ovl" label="Edena .ovl file (-e)" help="Specify here the Edena “.ovl” file obtained from the overlapping step" />

    <param name="overlapCutoff" type="integer" value="" optional="true" label="Overlap cutoff (-m)" help="The overlap cutoff is by default set to half of the reads length L (see the log output by the overlapping step to identify it). It is however still worth trying to increase this setting since it can greatly simplify highly connected overlaps graphs, and thus speed up the assembly. If one step during the assembly hangs, increasing the overlap cutoff is the first thing to do." />

    <param name="cc" type="boolean" checked="true" label="Contextual cleaning (-cc)" help="This option is enabled by default. Contextual cleaning is a procedure that efficiently identifies and removes false positive edges, improving thus the assembly. This procedure can be seen as a dynamic overlap cutoff on the overlaps graph. It is possible however for this step to be slow on ultra-high covered sequencing data. In such cases, try to increase the overlap cutoff value, or to simply disable this option." />

    <param name="discardNonUsable" type="boolean" checked="true" label="Discard non usable nodes (-discardNonUsable)" help="Enabled by default, this procedure discards nodes smaller than 1.5*readLength and that are not connected to any other nodes." />

    <param name="minContigSize" type="integer" value="" optional="true" label="Minimum size of the contigs to output (-c)" help="If not specified, this value is set to 1.5*readLength." />

    <param name="minCoverage" type="float" value="" optional="true" label="Minimum required coverage for the contigs (-minCoverage)" help="If not specified, this value is automatically determined from the nodes coverage distribution. This estimation however supposes a uniform coverage. It could be worth overriding this parameter in some cases, i.e. with transcriptome data, or a mix of PCR product assemblies." />

    <param name="trim" type="integer" value="4" optional="true" label="Coverage cutoff for contigs ends (-trim)" help="Contig interruptions are caused either because of a non-resolved ambiguity, or because of a lack of overlapping reads. In the latter case, the contig end may be inaccurate. This option will trim such ends until a minimum coverage is reached. By default, this value is set to 4. To disable contigs ends trimming, set this value to 1." />

    <param name="peHorizon" type="integer" value="" optional="true" label="Maximum search distance for paired-end reads connection (-peHorizon)" help="Edena samples the overlaps graph to accurately determine the paired distance distribution. This parameter specifies the maximum distance that is searched during this sampling. By default, this value is set to 1000 if solely direct-reverse mates are used and 10000 if reverse-direct mates are also used. This value has to be set to at least 2X the expected size of the longest mate library." />
  </inputs>

  <outputs>
    <data name="covStats" format="tabular" label="${tool.name} on ${on_string}: CovStats" />
    <data name="out_contigs_cov" format="txt" label="${tool.name} on ${on_string}: ContigsCov" />
    <data name="out_contigs_fasta" format="fasta" label="${tool.name} on ${on_string}: ContigsFasta" />
    <data name="out_contigs_lay" format="txt" label="${tool.name} on ${on_string}: ContigsLay" />
    <data name="out_log_txt" format="txt" label="${tool.name} on ${on_string}: log" />
    <data name="out_nodesInfo" format="txt" label="${tool.name} on ${on_string}: nodes info" />
    <data name="out_nodesPosition" format="txt" label="${tool.name} on ${on_string}: nodes position" />
    <data name="logfile" format="txt" label="${tool.name} on ${on_string}: log (terminal)" />
  </outputs>

  <tests>

  </tests>
  <help>
**What it does**

Edena is an overlaps graph based short reads assembler and is suited to Illumina GA reads. An assembly with Edena is a two step process: overlapping and assembling.

In the assembling step, the overlapping file (produced in the previous step) is provided to the program, as well as some assembly parameters. A set of contigs in FASTA format is outputted. The purpose of having a two step process is that the overlapping file is computed only once and can then be used to produce assemblies with different parameters.

The key parameter for this step is the overlaps size cutoff (option –m). By default it is set to half of the reads length, which is quite conservative. If your sequencing project is well covered (>50-100x) you may try increasing a bit this value. The minCoverage is an important parameter which is automatically determined. You may check this value in the program output and possibly override it.

**License and citation**

This Galaxy tool is Copyright © 2013 `CRS4 Srl.`_ and is released under the `MIT license`_.

.. _CRS4 Srl.: http://www.crs4.it/
.. _MIT license: http://opensource.org/licenses/MIT

If you use this tool in Galaxy, please cite |Cuccuru2013|_.

.. |Cuccuru2013| replace:: Cuccuru, G., Orsini, M., Pinna, A., Sbardellati, A., Soranzo, N., Travaglione, A., Uva, P., Zanetti, G., Fotia, G. (2013) Orione, a web-based framework for NGS analysis in microbiology. *Submitted*
.. _Cuccuru2013: http://orione.crs4.it/

This tool uses `Edena`_, which is licensed separately. Please cite |Hernandez2008|_.

.. _Edena: http://www.genomic.ch/edena.php
.. |Hernandez2008| replace:: Hernandez, D., *et al.* (2008) De novo bacterial genome sequencing: Millions of very short reads assembled on a desktop computer. *Genome Res.* 18(5), 802-809
.. _Hernandez2008: http://genome.cshlp.org/content/18/5/802
  </help>
</tool>