comparison data_manager/tagdust_architecture_data_manager.xml @ 0:e3b3261e5498 draft default tip

Uploaded
author brenninc
date Sun, 08 May 2016 04:44:17 -0400
parents
children
comparison
equal deleted inserted replaced
-1:000000000000 0:e3b3261e5498
1 <tool id="tagdust_architecture_manager" name="tagdust architecture manager" tool_type="manage_data" version="0.0.1">
2 <description>architecture creator</description>
3 <command interpreter="python">
4 tagdust_architecture_data_manager.py
5 --data_table_name "tagdust_architecture"
6 --json_output_file "${json_output_file}"
7 </command>
8 <inputs>
9 <repeat name="hmms" title="HMM Building Blocks">
10 <param name="block" type="text" size="25" label="Next HMM Building block" />
11 </repeat>
12 <param name="name" type="text" value="" label="name field for the entry. Defaults to a contactenation of hmm values if left blank." />
13 <param name="value" type="text" value="" label="value field for the entry. Defaults to name if left blank." />
14 <param name="dbkey" type="text" value="" label="dbkey field for the entry. Defaults to value if left blank." />
15 </inputs>
16 <outputs>
17 <data name="json_output_file" format="data_manager_json"/>
18 </outputs>
19
20 <help>
21 Adds a path to the tagdust references.
22
23 The tool will check the path exists but NOT check that it holds the expected data type.
24
25 If name is not provided a concatenation of hmm values is used.
26
27 If value is not provided, the name will be used (or its default)
28
29 If dbkey is not provided, the value will be used (or its default)
30
31 ====
32
33 Taken from The TagDust2 Manual http://tagdust.sourceforge.net (part of Version 2_31 download)
34
35 Raw sequences produced by next generation sequencing (NGS) machines can contain adapter, linker,
36 barcode and fingerprint sequences. TagDust2 is a program to extract and correctly label the sequences
37 to be mapped in downstream pipelines.
38 TagDust allows users to specify the expected architecture of a read and converts it into a hidden
39 Markov model. The latter can assign sequences to a particular barcode (or index) even in the presence
40 of sequencing errors. Sequences not matching the architecture (primer dimers, contaminants etc.) are
41 automatically discarded
42
43 TagDust requires an input file containing sequences and a user defined HMM architecture used to ex-
44 tract the reads. The architecture is composed of a selection of pre-defined building blocks representing
45 indices, barcodes, spacers and other sequences one might encounter in the raw output of a sequenced
46 sample.
47
48 HMM Building Blocks
49
50 TagDust comes with a set of pre-defined HMM building blocks. Each includes a silent state at the
51 beginning and end used to link blocks together. Each block is specified by a unique letter following
52 by a colon and some information about the sequence.
53
54 Read
55 Segment modeling the read.
56 Code: R:N
57
58 Optional
59 Segment modeling an optional single or short stretch of nucleotides.
60 Code: O:N
61
62 G addition
63 Segment modeling the occasional addition of guanines to the reads.
64 (89.3% chance of a single G , 19.5% chance of 2 Gs..).
65 Code: G:G
66
67 Barcode or Index
68 Segment modeling a set of barcode sequences. For each sequence a separate HMM is created. The
69 barcode sequences must be given as a comma separated list. A null model of the same length as the
70 barcode is automatically added and initialized to the background nucleotide frequencies.
71 Code: B:GTA,AAC
72
73 Fingerprint or Unique Molecular Identifier - UMI
74 Segment modeling a fingerprint (or unique molecular identifiers). Insertions and deletions are by
75 default not allowed within a fingerprint segment.
76 Code: F:NNN
77
78 Spacer
79 Segment modeling a pre-defined sequence.
80 Code: S:GTA
81
82 Partial
83 This segment is used to model sequences that may only be partially present at the 5‘ or 3‘ end of
84 the read. The transition probabilities (orange and blue) are set automatically based on the length
85 distribution of exactly matching adapters.
86 Code: P:CCTTAA
87
88
89 </help>
90 <citations>
91 </citations>
92
93 </tool>