annotate GenotypingSTR.xml @ 6:d75894f5d61b draft

Uploaded
author arkarachai-fungtammasan
date Sat, 22 Aug 2015 12:13:34 -0400
parents d5ed5c2e25c3
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
2
d5ed5c2e25c3 Uploaded
arkarachai-fungtammasan
parents: 0
diff changeset
1 <tool id="GenotypeSTR" name="Correct genotype for STR errors" version="2.0.0">
d5ed5c2e25c3 Uploaded
arkarachai-fungtammasan
parents: 0
diff changeset
2 <description> that occur during sequencing and library prep </description>
0
07588b899c13 Uploaded
arkarachai-fungtammasan
parents:
diff changeset
3 <command interpreter="python2.7">GenotypeTRcorrection.py $microsat_raw $microsat_error_profile $microsat_corrected $expectedminorallele </command>
07588b899c13 Uploaded
arkarachai-fungtammasan
parents:
diff changeset
4
07588b899c13 Uploaded
arkarachai-fungtammasan
parents:
diff changeset
5 <inputs>
07588b899c13 Uploaded
arkarachai-fungtammasan
parents:
diff changeset
6 <param name="microsat_raw" type="data" label="Select microsatellite length profile that need to refine genotyping" />
07588b899c13 Uploaded
arkarachai-fungtammasan
parents:
diff changeset
7 <param name="microsat_error_profile" type="data" label="Select microsatellite error profile that correspond to this dataset" />
07588b899c13 Uploaded
arkarachai-fungtammasan
parents:
diff changeset
8 <param name="expectedminorallele" type="float" value="0.5" label="Expected contribution of minor allele when present (0.5 for genotyping)" />
07588b899c13 Uploaded
arkarachai-fungtammasan
parents:
diff changeset
9
07588b899c13 Uploaded
arkarachai-fungtammasan
parents:
diff changeset
10 </inputs>
07588b899c13 Uploaded
arkarachai-fungtammasan
parents:
diff changeset
11 <outputs>
07588b899c13 Uploaded
arkarachai-fungtammasan
parents:
diff changeset
12 <data name="microsat_corrected" format="tabular" />
07588b899c13 Uploaded
arkarachai-fungtammasan
parents:
diff changeset
13 </outputs>
07588b899c13 Uploaded
arkarachai-fungtammasan
parents:
diff changeset
14 <tests>
07588b899c13 Uploaded
arkarachai-fungtammasan
parents:
diff changeset
15 <!-- Test data with valid values -->
07588b899c13 Uploaded
arkarachai-fungtammasan
parents:
diff changeset
16 <test>
07588b899c13 Uploaded
arkarachai-fungtammasan
parents:
diff changeset
17 <param name="microsat_raw" value="sampleTRprofile_C.txt"/>
07588b899c13 Uploaded
arkarachai-fungtammasan
parents:
diff changeset
18 <param name="microsat_error_profile" value="PCRinclude.allrate.bymajorallele"/>
07588b899c13 Uploaded
arkarachai-fungtammasan
parents:
diff changeset
19 <param name="expectedminorallele" value="0.5"/>
07588b899c13 Uploaded
arkarachai-fungtammasan
parents:
diff changeset
20 <output name="microsat_corrected" file="sampleTRgenotypingcorrection"/>
07588b899c13 Uploaded
arkarachai-fungtammasan
parents:
diff changeset
21 </test>
07588b899c13 Uploaded
arkarachai-fungtammasan
parents:
diff changeset
22
07588b899c13 Uploaded
arkarachai-fungtammasan
parents:
diff changeset
23 </tests>
07588b899c13 Uploaded
arkarachai-fungtammasan
parents:
diff changeset
24 <help>
07588b899c13 Uploaded
arkarachai-fungtammasan
parents:
diff changeset
25
07588b899c13 Uploaded
arkarachai-fungtammasan
parents:
diff changeset
26
07588b899c13 Uploaded
arkarachai-fungtammasan
parents:
diff changeset
27 .. class:: infomark
07588b899c13 Uploaded
arkarachai-fungtammasan
parents:
diff changeset
28
07588b899c13 Uploaded
arkarachai-fungtammasan
parents:
diff changeset
29 **What it does**
07588b899c13 Uploaded
arkarachai-fungtammasan
parents:
diff changeset
30
2
d5ed5c2e25c3 Uploaded
arkarachai-fungtammasan
parents: 0
diff changeset
31 - This tool will correct for STR sequencing and library preparation errors using error rates estimated from hemizygous male X chromosome (https://usegalaxy.org/u/guru%40psu.edu/h/error-rates-files) or rates provided by user. The STR length profile for each locus will be processed independently.
d5ed5c2e25c3 Uploaded
arkarachai-fungtammasan
parents: 0
diff changeset
32 - First, this tool will find three most common STR lengths from input STR length profile. If the STR length profile has only one length of STR, the length of one motif longer than the observed length will be used as the second most common STR length.
d5ed5c2e25c3 Uploaded
arkarachai-fungtammasan
parents: 0
diff changeset
33 - Second, it will calculate probability of three forms of homozygotes and use the form with the highest probability. The same goes for heterozygotes.
d5ed5c2e25c3 Uploaded
arkarachai-fungtammasan
parents: 0
diff changeset
34 - Third, this tools will calculate log10 of the ratio of the probability of homozygote to the probability of heterozygote. If this value is more than 0, it will predict this locus to be homozygote. If this value is less than 0, it will predict this locus to be heterozygote. If this value is 0, read profile at this locus will be discarded.
0
07588b899c13 Uploaded
arkarachai-fungtammasan
parents:
diff changeset
35
07588b899c13 Uploaded
arkarachai-fungtammasan
parents:
diff changeset
36 **Citation**
07588b899c13 Uploaded
arkarachai-fungtammasan
parents:
diff changeset
37
07588b899c13 Uploaded
arkarachai-fungtammasan
parents:
diff changeset
38 When you use this tool, please cite **Fungtammasan A, Ananda G, Hile SE, Su MS, Sun C, Harris R, Medvedev P, Eckert K, Makova KD. 2015. Accurate Typing of Short Tandem Repeats from Genome-wide Sequencing Data and its Applications, Genome Research**
07588b899c13 Uploaded
arkarachai-fungtammasan
parents:
diff changeset
39
07588b899c13 Uploaded
arkarachai-fungtammasan
parents:
diff changeset
40 **Input**
07588b899c13 Uploaded
arkarachai-fungtammasan
parents:
diff changeset
41
07588b899c13 Uploaded
arkarachai-fungtammasan
parents:
diff changeset
42 - The input files need to contain at least three columns.
2
d5ed5c2e25c3 Uploaded
arkarachai-fungtammasan
parents: 0
diff changeset
43 - Column 1 = location of STR locus.
d5ed5c2e25c3 Uploaded
arkarachai-fungtammasan
parents: 0
diff changeset
44 - Column 2 = length profile (length of STR in each read that mapped to this location in comma separated format).
d5ed5c2e25c3 Uploaded
arkarachai-fungtammasan
parents: 0
diff changeset
45 - Column 3 = motif of STR in this locus. The input file can contain more than three columns.
0
07588b899c13 Uploaded
arkarachai-fungtammasan
parents:
diff changeset
46
07588b899c13 Uploaded
arkarachai-fungtammasan
parents:
diff changeset
47 **Output**
07588b899c13 Uploaded
arkarachai-fungtammasan
parents:
diff changeset
48
2
d5ed5c2e25c3 Uploaded
arkarachai-fungtammasan
parents: 0
diff changeset
49 The output will be contain original three (or more) columns as the input. However, it will also have these following columns.
0
07588b899c13 Uploaded
arkarachai-fungtammasan
parents:
diff changeset
50
2
d5ed5c2e25c3 Uploaded
arkarachai-fungtammasan
parents: 0
diff changeset
51 - Additional column 1 = homozygote/heterozygote label.
d5ed5c2e25c3 Uploaded
arkarachai-fungtammasan
parents: 0
diff changeset
52 - Additional column 2 = log based 10 of (the probability of homozygote/the probability of heterozygote)
d5ed5c2e25c3 Uploaded
arkarachai-fungtammasan
parents: 0
diff changeset
53 - Additional column 3 = Allele for most probable homozygote.
d5ed5c2e25c3 Uploaded
arkarachai-fungtammasan
parents: 0
diff changeset
54 - Additional column 4 = Allele 1 for most probable heterozygote.
d5ed5c2e25c3 Uploaded
arkarachai-fungtammasan
parents: 0
diff changeset
55 - Additional column 5 = Allele 2 for most probable heterozygote.
0
07588b899c13 Uploaded
arkarachai-fungtammasan
parents:
diff changeset
56
07588b899c13 Uploaded
arkarachai-fungtammasan
parents:
diff changeset
57 **Example**
07588b899c13 Uploaded
arkarachai-fungtammasan
parents:
diff changeset
58
2
d5ed5c2e25c3 Uploaded
arkarachai-fungtammasan
parents: 0
diff changeset
59 - Suppose that we sequence a locus of STR with NGS. This locus has **A** motif and the following STR length (bp) profile. ::
0
07588b899c13 Uploaded
arkarachai-fungtammasan
parents:
diff changeset
60
07588b899c13 Uploaded
arkarachai-fungtammasan
parents:
diff changeset
61 chr1_100_106 5, 6, 6, 6, 6, 7, 7, 8, 8 A
07588b899c13 Uploaded
arkarachai-fungtammasan
parents:
diff changeset
62
2
d5ed5c2e25c3 Uploaded
arkarachai-fungtammasan
parents: 0
diff changeset
63 - We want to figure out if this locus is a homoozygote or heterozygote and the corresponding allele(s). Therefore, we use this tool to refine genotype.
d5ed5c2e25c3 Uploaded
arkarachai-fungtammasan
parents: 0
diff changeset
64 - This tool will calculate the probability of homozygote A6A6, A7A7, and A8A8 to generate the observed STR length profile. Among this A7A7 has the highest probability. Therefore, we use this form as the representative for homozygote.
d5ed5c2e25c3 Uploaded
arkarachai-fungtammasan
parents: 0
diff changeset
65 - Then, this tool will calculate the probability of heterozygote A6A7, A7A8, and A6A8 to generate the observed STR length profile. Among this A6A8 has the highest probability. Therefore, we use this form as the representative for heterozygote.
d5ed5c2e25c3 Uploaded
arkarachai-fungtammasan
parents: 0
diff changeset
66 - Finally, it will compare the representative homozygous and heterozygous forms. The A6A8 has higher probability than A7A7. Therefore, the program will report that this locus as a heterozygous locus of form A6A8. ::
0
07588b899c13 Uploaded
arkarachai-fungtammasan
parents:
diff changeset
67
07588b899c13 Uploaded
arkarachai-fungtammasan
parents:
diff changeset
68 chr1 5,6,6,6,6,7,7,8,8 A hetero -14.8744881854 7 6 8
07588b899c13 Uploaded
arkarachai-fungtammasan
parents:
diff changeset
69
07588b899c13 Uploaded
arkarachai-fungtammasan
parents:
diff changeset
70
07588b899c13 Uploaded
arkarachai-fungtammasan
parents:
diff changeset
71 </help>
07588b899c13 Uploaded
arkarachai-fungtammasan
parents:
diff changeset
72 </tool>