Mercurial > repos > brenninc > data_manager_tagdust_architecture

<tool id="tagdust_architecture_manager" name="tagdust architecture manager" tool_type="manage_data" version="0.0.1">
    <description>architecture creator</description>
    <command interpreter="python">
        tagdust_architecture_data_manager.py
            --data_table_name "tagdust_architecture"
            --json_output_file "${json_output_file}"
    </command>
    <inputs>
        <repeat name="hmms" title="HMM Building Blocks">
            <param name="block" type="text" size="25" label="Next HMM Building block" />
        </repeat>
        <param name="name" type="text" value="" label="name field for the entry. Defaults to a contactenation of hmm values if left blank." />
        <param name="value" type="text" value="" label="value field for the entry.  Defaults to name if left blank." />
        <param name="dbkey" type="text" value="" label="dbkey field for the entry.  Defaults to value if left blank." />
    </inputs>
    <outputs>
        <data name="json_output_file" format="data_manager_json"/>
    </outputs>

    <help>
Adds a path to the tagdust references.

The tool will check the path exists but NOT check that it holds the expected data type.

If name is not provided a concatenation of hmm values is used.

If value is not provided, the name will be used (or its default)

If dbkey is not provided, the value will be used (or its default)

====

Taken from The TagDust2 Manual http://tagdust.sourceforge.net (part of Version 2_31 download)

Raw sequences produced by next generation sequencing (NGS) machines can contain adapter, linker,
barcode and fingerprint sequences. TagDust2 is a program to extract and correctly label the sequences
to be mapped in downstream pipelines.
TagDust allows users to specify the expected architecture of a read and converts it into a hidden
Markov model. The latter can assign sequences to a particular barcode (or index) even in the presence
of sequencing errors. Sequences not matching the architecture (primer dimers, contaminants etc.) are
automatically discarded

TagDust requires an input file containing sequences and a user defined HMM architecture used to ex-
tract the reads. The architecture is composed of a selection of pre-defined building blocks representing
indices, barcodes, spacers and other sequences one might encounter in the raw output of a sequenced
sample.

HMM Building Blocks

TagDust comes with a set of pre-defined HMM building blocks. Each includes a silent state at the
beginning and end used to link blocks together. Each block is specified by a unique letter following
by a colon and some information about the sequence.

Read
Segment modeling the read.
Code: R:N

Optional
Segment modeling an optional single or short stretch of nucleotides.
Code: O:N

G addition
Segment modeling the occasional addition of guanines to the reads.
(89.3% chance of a single G , 19.5% chance of 2 Gs..).
Code: G:G

Barcode or Index
Segment modeling a set of barcode sequences. For each sequence a separate HMM is created. The
barcode sequences must be given as a comma separated list. A null model of the same length as the
barcode is automatically added and initialized to the background nucleotide frequencies.
Code: B:GTA,AAC

Fingerprint or Unique Molecular Identifier - UMI
Segment modeling a fingerprint (or unique molecular identifiers). Insertions and deletions are by
default not allowed within a fingerprint segment.
Code: F:NNN

Spacer
Segment modeling a pre-defined sequence.
Code: S:GTA

Partial
This segment is used to model sequences that may only be partially present at the 5‘ or 3‘ end of
the read. The transition probabilities (orange and blue) are set automatically based on the length
distribution of exactly matching adapters.
Code: P:CCTTAA


    </help>
    <citations>
    </citations>

</tool>
author	brenninc
date	Sun, 08 May 2016 04:44:17 -0400
parents
children