Mercurial > repos > nick > dunovo

<?xml version="1.0"?>
<tool id="align_families" name="Du Novo: Align families" version="2.15">
  <description>of duplex sequencing reads</description>
  <requirements>
    <requirement type="package" version="7.221">mafft</requirement>
    <requirement type="package" version="2.15">dunovo</requirement>
    <!-- TODO: require Python 2.7 -->
  </requirements>
  <version_command>align-families.py --version</version_command>
  <command detect_errors="exit_code">align-families.py $check_ids --aligner $aligner --galaxy $phone --processes \${GALAXY_SLOTS:-1} '$input' &gt; '$output'
  </command>
  <inputs>
    <param name="input" type="data" format="tabular" label="Input reads" help="with barcodes, grouped by family"/>
    <param name="aligner" type="select" value="mafft" label="Multiple sequence aligner" help="MAFFT is the original aligner Du Novo was published with in 2016. Kalign is much faster and has similar accuracy.">
      <option value="kalign">Kalign2</option>
      <option value="mafft">MAFFT</option>
    </param>
    <param name="check_ids" type="boolean" truevalue="" falsevalue="--no-check-ids" checked="True" label="Check read names" help="Make sure reads are properly paired up. The job will fail if there is a pair of reads where their ids aren't identical (minus any ending /1 or /2)."/>
    <param name="phone" type="boolean" truevalue="--phone-home" falsevalue="" checked="False" label="Send usage data" help="Report helpful usage data to the developer, to better understand the use cases and performance of the tool. The only data which will be recorded is the name and version of the tool, the size of the input data, the number of processes used, the time and memory taken to process it, the alignment algorithm selected, and the IP address of the machine running it. Also, if the tool fails, it will report the name of the exception thrown and the line of code it occurred in. The names of the input and output datasets are not sent. All the reporting and recording code is available at https://github.com/NickSto/ET."/>
  </inputs>
  <outputs>
    <data name="output" format="tabular"/>
  </outputs>
  <tests>
    <test>
      <param name="input" value="smoke.families.tsv"/>
      <output name="output" file="smoke.families.aligned.tsv"/>
    </test>
    <test>
      <param name="input" value="families.sort.tsv"/>
      <output name="output" file="families.msa.tsv"/>
    </test>
  </tests>
  <help>

**What it does**

This is for processing duplex sequencing data. It does a multiple sequence alignment on each (single-stranded) family of reads.

-----

**Input**

This expects the output format of the "Make families" tool.

-----

**Output**

The output is a tabular file where each line corresponds to a (single) read.

The columns are::

  1: barcode (both tags)
  2: tag order in barcode ("ab" or "ba")
  3: read mate ("1" or "2")
  4: read name
  5: read sequence, aligned ("-" for gaps)
  6: read quality scores, aligned (" " for gaps)

-----

**Alignments**

When "MAFFT" is selected as the multiple sequence aligner, the alignments are done with the command
::

  $ mafft --nuc --quiet family.fa &gt; family.aligned.fa

  </help>
  <citations>
    <citation type="bibtex">@article{Stoler2016,
      author = {Stoler, Nicholas and Arbeithuber, Barbara and Guiblet, Wilfried and Makova, Kateryna D and Nekrutenko, Anton},
      doi = {10.1186/s13059-016-1039-4},
      issn = {1474-760X},
      journal = {Genome biology},
      number = {1},
      pages = {180},
      pmid = {27566673},
      publisher = {Genome Biology},
      title = {{Streamlined analysis of duplex sequencing data with Du Novo.}},
      url = {http://www.ncbi.nlm.nih.gov/pubmed/27566673},
      volume = {17},
      year = {2016}
    }</citation>
  </citations>
</tool>
author	nick
date	Fri, 01 Jun 2018 17:55:23 -0400
parents	fa563fa9b330
children	0f8e0dc73d1d