comparison profilegenerator.xml @ 0:07588b899c13 draft

Uploaded
author arkarachai-fungtammasan
date Wed, 01 Apr 2015 17:05:51 -0400
parents
children d5ed5c2e25c3
comparison
equal deleted inserted replaced
-1:000000000000 0:07588b899c13
1 <tool id="Profilegenerator" name="Generate all possible combination of read profile" version="2.0.0">
2 <description> of the consecutive allele from given error profile </description>
3 <command interpreter="python2.7">profilegenerator.py $error_profile $MOTIF $Maxdepth $minprob > $output </command>
4
5 <inputs>
6 <param name="error_profile" type="data" label="Select error profile" />
7 <param name="MOTIF" type="text" value="A" label="Type in a motif of interest (e.g. AGC)" />
8 <param name="Maxdepth" type="integer" value="30" label="Maximum read depth of interest" />
9 <param name="minprob" type="float" value="0.00000001" label="Minimum error rate to be considered" />
10
11 </inputs>
12 <outputs>
13 <data name="output" format="tabular" />
14 </outputs>
15 <tests>
16 <!-- Test data with valid values -->
17 <test>
18 <param name="error_profile" value="sampleprofilegenerator_in"/>
19 <param name="MOTIF" value="A"/>
20 <param name="Maxdepth" value="3"/>
21 <param name="minprob" file="0.00000001"/>
22 <output name="output" file="sampleprofilegenerator_out"/>
23 </test>
24
25 </tests>
26 <help>
27
28
29 .. class:: infomark
30
31 **What it does**
32
33 This tool will generate all possible combination of observed read profile of the consecutive alleles from given error profile. The range of observed read length can be filtered to contain only those that are frequently occur using "Minimum error rate to be considered" parameter.
34
35 This problem will collect the lists of valid (pass "Minimum error rate to be considered" threshold) observed length profiles from combination of consecutive allele lengths. The lists that are equivalent or the subset of the other lists will be removed. For each depth and each list, length profile were generated from combination with replacement which compatible with python 2.7. There could be redundant error profiles generated from different lists if more than one combination of allele is generated due to overlap range of observed microsatellite lengths. The user need to remove them which can be done easily using **sort | uniq** command in unix.
36
37
38 **Citation**
39
40 When you use this tool, please cite **Fungtammasan A, Ananda G, Hile SE, Su MS, Sun C, Harris R, Medvedev P, Eckert K, Makova KD. 2015. Accurate Typing of Short Tandem Repeats from Genome-wide Sequencing Data and its Applications, Genome Research**
41
42 **Input**
43
44 - The error profile needs to contain these three columns.
45 - Column 1 = Correct microsatellite length
46 - Column 2 = Observed microsatellite length
47 - Column 3 = Number of observation
48
49 **Output**
50
51 - Column 1 = Place holder for location of microsatellite locus. (just "chr")
52 - Column 2 = length profile (length of microsatellite in each read that mapped to this location in comma separated format).
53 - Column 3 = motif of microsatellite in this locus.
54
55 **Example**
56
57 - Suppose that we provide the following read profile ::
58
59 9 9 100000
60 10 10 91456
61 10 9 1259
62 11 11 39657
63 11 10 1211
64 11 12 514
65
66
67 - Using default minimum probability to be consider and motif = A, all observed read lengths are valid. The program will generated lists of observed length profiles from consecutive allele length. ::
68
69 9:10 = [9,10]
70 10:11 = [9,10,11,12]
71
72 - Lists that are subsets of other lists will be removed. Thus, [9,10] will not be considered.
73
74 - Then the program will generate all combination with replacement for each depth from each list. Using **maximum read depth =3**, we will ge the following output. ::
75
76
77 chr 9,9 A
78 chr 9,10 A
79 chr 9,11 A
80 chr 9,12 A
81 chr 10,10 A
82 chr 10,11 A
83 chr 10,12 A
84 chr 11,11 A
85 chr 11,12 A
86 chr 12,12 A
87 chr 9,9,9 A
88 chr 9,9,10 A
89 chr 9,9,11 A
90 chr 9,9,12 A
91 chr 9,10,10 A
92 chr 9,10,11 A
93 chr 9,10,12 A
94 chr 9,11,11 A
95 chr 9,11,12 A
96 chr 9,12,12 A
97 chr 10,10,10 A
98 chr 10,10,11 A
99 chr 10,10,12 A
100 chr 10,11,11 A
101 chr 10,11,12 A
102 chr 10,12,12 A
103 chr 11,11,11 A
104 chr 11,11,12 A
105 chr 11,12,12 A
106 chr 12,12,12 A
107
108
109 </help>
110 </tool>