Mercurial > repos > nick > dunovo

<?xml version="1.0"?>
<tool id="align_families" name="Du Novo: Align families" version="2.0.8">
  <description>of duplex sequencing reads</description>
  <requirements>
    <requirement type="package" version="7.221">mafft</requirement>
    <requirement type="package" version="2.0.8">dunovo</requirement>
    <!-- TODO: require Python 2.7 -->
  </requirements>
  <version_command>align_families.py --version</version_command>
  <command detect_errors="exit_code">align_families.py --aligner $aligner --galaxy $phone --processes \${GALAXY_SLOTS:-1} '$input' &gt; '$output'
  </command>
  <inputs>
    <param name="input" type="data" format="tabular" label="Input reads" help="with barcodes, grouped by family"/>
    <param name="aligner" type="select" value="mafft" label="Multiple sequence aligner" help="MAFFT is the original aligner Du Novo was published with in 2016. Kalign is much faster and has similar accuracy.">
      <option value="kalign">Kalign2</option>
      <option value="mafft">MAFFT</option>
    </param>
    <param name="phone" type="boolean" truevalue="--phone-home" falsevalue="" checked="False" label="Send usage data" help="Report helpful usage data to the developer, to better understand the use cases and performance of the tool. The only data which will be recorded is the name and version of the tool, the size of the input data, the number of processes used, the time and memory taken to process it, the alignment algorithm selected, and the IP address of the machine running it. Also, if the tool fails, it will report the name of the exception thrown and the line of code it occurred in. The names of the input and output datasets are not sent. All the reporting and recording code is available at https://github.com/NickSto/ET."/>
  </inputs>
  <outputs>
    <data name="output" format="tabular"/>
  </outputs>
  <tests>
    <test>
      <param name="input" value="smoke.families.tsv"/>
      <output name="output" file="smoke.families.aligned.tsv"/>
    </test>
    <test>
      <param name="input" value="families.sort.tsv"/>
      <output name="output" file="families.msa.tsv"/>
    </test>
  </tests>
  <help>

**What it does**

This is for processing duplex sequencing data. It does a multiple sequence alignment on each (single-stranded) family of reads.

-----

**Input**

This expects the output format of the "Make families" tool.

-----

**Output**

The output is a tabular file where each line corresponds to a (single) read.

The columns are::

  1: barcode (both tags)
  2: tag order in barcode ("ab" or "ba")
  3: read mate ("1" or "2")
  4: read name
  5: read sequence, aligned ("-" for gaps)
  6: read quality scores, aligned (" " for gaps)

-----

**Alignments**

The alignments are done using MAFFT, specifically the command
::

  $ mafft --nuc --quiet family.fa &gt; family.aligned.fa

  </help>
  <citations>
    <citation type="bibtex">@article{Stoler2016,
      author = {Stoler, Nicholas and Arbeithuber, Barbara and Guiblet, Wilfried and Makova, Kateryna D and Nekrutenko, Anton},
      doi = {10.1186/s13059-016-1039-4},
      issn = {1474-760X},
      journal = {Genome biology},
      number = {1},
      pages = {180},
      pmid = {27566673},
      publisher = {Genome Biology},
      title = {{Streamlined analysis of duplex sequencing data with Du Novo.}},
      url = {http://www.ncbi.nlm.nih.gov/pubmed/27566673},
      volume = {17},
      year = {2016}
    }</citation>
  </citations>
</tool>
author	nick
date	Thu, 02 Nov 2017 16:24:48 -0400
parents	000969829a5d
children	e7b88ffb8294