comparison GenotypingSTR.xml @ 2:d5ed5c2e25c3 draft

Uploaded
author arkarachai-fungtammasan
date Wed, 22 Apr 2015 12:48:40 -0400
parents 07588b899c13
children
comparison
equal deleted inserted replaced
1:f2bab38e3cbd 2:d5ed5c2e25c3
1 <tool id="GenotypeSTR" name="Correct genotype for microsatellite errors" version="2.0.0"> 1 <tool id="GenotypeSTR" name="Correct genotype for STR errors" version="2.0.0">
2 <description> during sequencing and library prep </description> 2 <description> that occur during sequencing and library prep </description>
3 <command interpreter="python2.7">GenotypeTRcorrection.py $microsat_raw $microsat_error_profile $microsat_corrected $expectedminorallele </command> 3 <command interpreter="python2.7">GenotypeTRcorrection.py $microsat_raw $microsat_error_profile $microsat_corrected $expectedminorallele </command>
4 4
5 <inputs> 5 <inputs>
6 <param name="microsat_raw" type="data" label="Select microsatellite length profile that need to refine genotyping" /> 6 <param name="microsat_raw" type="data" label="Select microsatellite length profile that need to refine genotyping" />
7 <param name="microsat_error_profile" type="data" label="Select microsatellite error profile that correspond to this dataset" /> 7 <param name="microsat_error_profile" type="data" label="Select microsatellite error profile that correspond to this dataset" />
26 26
27 .. class:: infomark 27 .. class:: infomark
28 28
29 **What it does** 29 **What it does**
30 30
31 - This tool will correct for microsatellite sequencing and library preparation errors using error rates estimated from hemizygous male X chromosome or any rates provided by user. The read profile for each locus will be processed independently. 31 - This tool will correct for STR sequencing and library preparation errors using error rates estimated from hemizygous male X chromosome (https://usegalaxy.org/u/guru%40psu.edu/h/error-rates-files) or rates provided by user. The STR length profile for each locus will be processed independently.
32 - First, this tool will find three most common read lengths from input read length profile. If the read profile has only one length of TR, the length of one motif longer than the observed length will be used as the second most common read length. 32 - First, this tool will find three most common STR lengths from input STR length profile. If the STR length profile has only one length of STR, the length of one motif longer than the observed length will be used as the second most common STR length.
33 - Second, it will calculate probability of three forms of homozygous and use the form which give the highest probability. The same goes for heterozygous. 33 - Second, it will calculate probability of three forms of homozygotes and use the form with the highest probability. The same goes for heterozygotes.
34 - Third, this tools will calculate log based 10 of (the probability of homozygous/the probability of heterozygous). If this value is more than 0, it will predict this locus to homozygous. If this value is less than 0, it will predict this locus to heterozygous. If this value is 0, read profile at this locus will be discard. 34 - Third, this tools will calculate log10 of the ratio of the probability of homozygote to the probability of heterozygote. If this value is more than 0, it will predict this locus to be homozygote. If this value is less than 0, it will predict this locus to be heterozygote. If this value is 0, read profile at this locus will be discarded.
35 35
36 **Citation** 36 **Citation**
37 37
38 When you use this tool, please cite **Fungtammasan A, Ananda G, Hile SE, Su MS, Sun C, Harris R, Medvedev P, Eckert K, Makova KD. 2015. Accurate Typing of Short Tandem Repeats from Genome-wide Sequencing Data and its Applications, Genome Research** 38 When you use this tool, please cite **Fungtammasan A, Ananda G, Hile SE, Su MS, Sun C, Harris R, Medvedev P, Eckert K, Makova KD. 2015. Accurate Typing of Short Tandem Repeats from Genome-wide Sequencing Data and its Applications, Genome Research**
39 39
40 **Input** 40 **Input**
41 41
42 - The input files need to contain at least three columns. 42 - The input files need to contain at least three columns.
43 - Column 1 = location of microsatellite locus. 43 - Column 1 = location of STR locus.
44 - Column 2 = length profile (length of microsatellite in each read that mapped to this location in comma separated format). 44 - Column 2 = length profile (length of STR in each read that mapped to this location in comma separated format).
45 - Column 3 = motif of microsatellite in this locus. The input file can contain more than three column. 45 - Column 3 = motif of STR in this locus. The input file can contain more than three columns.
46 46
47 **Output** 47 **Output**
48 48
49 The output will be contain original three (or more) column as the input. However, it will also have these following columns. 49 The output will be contain original three (or more) columns as the input. However, it will also have these following columns.
50 50
51 - Additional column 1 = homozygous/heterozygous label. 51 - Additional column 1 = homozygote/heterozygote label.
52 - Additional column 2 = log based 10 of (the probability of homozygous/the probability of heterozygous) 52 - Additional column 2 = log based 10 of (the probability of homozygote/the probability of heterozygote)
53 - Additional column 3 = Allele for most probable homozygous form. 53 - Additional column 3 = Allele for most probable homozygote.
54 - Additional column 4 = Allele 1 for most probable heterozygous form. 54 - Additional column 4 = Allele 1 for most probable heterozygote.
55 - Additional column 5 = Allele 2 for most probable heterozygous form. 55 - Additional column 5 = Allele 2 for most probable heterozygote.
56 56
57 **Example** 57 **Example**
58 58
59 - Suppose that we sequence one locus of microsatellite with NGS. This locus has **A** motif and the following length (bp) profile. :: 59 - Suppose that we sequence a locus of STR with NGS. This locus has **A** motif and the following STR length (bp) profile. ::
60 60
61 chr1_100_106 5, 6, 6, 6, 6, 7, 7, 8, 8 A 61 chr1_100_106 5, 6, 6, 6, 6, 7, 7, 8, 8 A
62 62
63 - We want to figure out if this locus is a homolozygous or heterozygous and the corresponding allele(s). Therefore, we use this tool to refine genotype. 63 - We want to figure out if this locus is a homoozygote or heterozygote and the corresponding allele(s). Therefore, we use this tool to refine genotype.
64 - This tool will calculate the probability of homozygous A6A6, A7A7, and A8A8 to generate observed length profile. Among this A7A7 has the highest probability. Therefore, we use this form as the representative for homozygous. 64 - This tool will calculate the probability of homozygote A6A6, A7A7, and A8A8 to generate the observed STR length profile. Among this A7A7 has the highest probability. Therefore, we use this form as the representative for homozygote.
65 - Then, this tool will calculate the probability of heterozygous A6A7, A7A8, and A6A8 to generate observed length profile. Among this A6A8 has the highest probability. Therefore, we use this form as the representative for heterozygous. 65 - Then, this tool will calculate the probability of heterozygote A6A7, A7A8, and A6A8 to generate the observed STR length profile. Among this A6A8 has the highest probability. Therefore, we use this form as the representative for heterozygote.
66 - The A6A7 has higher probability than A7A7. Therefore, the program will report that this locus is a heterozygous locus. :: 66 - Finally, it will compare the representative homozygous and heterozygous forms. The A6A8 has higher probability than A7A7. Therefore, the program will report that this locus as a heterozygous locus of form A6A8. ::
67 67
68 chr1 5,6,6,6,6,7,7,8,8 A hetero -14.8744881854 7 6 8 68 chr1 5,6,6,6,6,7,7,8,8 A hetero -14.8744881854 7 6 8
69 69
70 70
71 </help> 71 </help>