Mercurial > repos > nick > dunovo

<?xml version="1.0"?>
<tool id="align_families" name="Du Novo: Align families" version="3.0.2">
  <description>of duplex sequencing reads</description>
  <requirements>
    <requirement type="package" version="3.0.2">dunovo</requirement>
  </requirements>
  <version_command>align-families.py --version</version_command>
  <command detect_errors="exit_code">align-families.py $check_ids --aligner $aligner --galaxy $phone --processes \${GALAXY_SLOTS:-1} '$input' &gt; '$output'
  </command>
  <inputs>
    <param name="input" type="data" format="tabular" label="Input reads" help="with barcodes, grouped by family"/>
    <param name="aligner" type="select" value="mafft" label="Multiple sequence aligner" help="MAFFT is the original aligner Du Novo was published with in 2016. Kalign is much faster and has similar accuracy.">
      <option value="kalign">Kalign2</option>
      <option value="mafft">MAFFT</option>
    </param>
    <param name="check_ids" type="boolean" truevalue="" falsevalue="--no-check-ids" checked="True" label="Check read names" help="Make sure reads are properly paired up. The job will fail if there is a pair of reads where their ids aren't identical (minus any ending /1 or /2)."/>
    <param name="phone" type="boolean" truevalue="--phone-home" falsevalue="" checked="False" label="Send usage data" help="Report helpful usage data to the developer, to better understand the use cases and performance of the tool. The only data which will be recorded is the name and version of the tool, the size of the input data, the number of processes used, the time and memory taken to process it, the alignment algorithm selected, and the IP address of the machine running it. Also, if the tool fails, it will report the name of the exception thrown and the line of code it occurred in. The names of the input and output datasets are not sent. All the reporting and recording code is available at https://github.com/NickSto/ET."/>
  </inputs>
  <outputs>
    <data name="output" format="tabular"/>
  </outputs>
  <tests>
    <test>
      <param name="input" value="smoke.families.tsv"/>
      <output name="output" file="smoke.families.aligned.tsv"/>
    </test>
    <test>
      <param name="input" value="families.sort.tsv"/>
      <param name="check_ids" value="--no-check-ids"/>
      <output name="output" file="families.msa.tsv"/>
    </test>
  </tests>
  <help>

**What it does**

This is for processing duplex sequencing data. It does a multiple sequence alignment on each (single-stranded) family of reads.

-----

**Input**

This expects the output format of the "Make families" tool.

-----

**Output**

The output is a tabular file where each line corresponds to a (single) read.

The columns are::

  1: barcode (both tags)
  2: tag order in barcode ("ab" or "ba")
  3: read mate ("1" or "2")
  4: read name
  5: read sequence, aligned ("-" for gaps)
  6: read quality scores, aligned (" " for gaps)

-----

**Alignments**

When "MAFFT" is selected as the multiple sequence aligner, the alignments are done with the command
::

  $ mafft --nuc --quiet family.fa &gt; family.aligned.fa

  </help>
  <citations>
    <citation type="bibtex">@article{Stoler2016,
      author = {Stoler, Nicholas and Arbeithuber, Barbara and Guiblet, Wilfried and Makova, Kateryna D and Nekrutenko, Anton},
      doi = {10.1186/s13059-016-1039-4},
      issn = {1474-760X},
      journal = {Genome biology},
      number = {1},
      pages = {180},
      pmid = {27566673},
      publisher = {Genome Biology},
      title = {{Streamlined analysis of duplex sequencing data with Du Novo.}},
      url = {http://www.ncbi.nlm.nih.gov/pubmed/27566673},
      volume = {17},
      year = {2016}
    }</citation>
  </citations>
</tool>
author	nick
date	Wed, 16 Feb 2022 22:59:00 +0000
parents	9dc43bf7d1db
children