comparison allele-counts.xml @ 4:898eb3daab43

Complete documentation
author nick
date Tue, 04 Jun 2013 00:16:29 -0400
parents 933a9435939c
children 31361191d2d2
comparison
equal deleted inserted replaced
3:933a9435939c 4:898eb3daab43
15 <exit_code range=":-1" err_level="fatal"/> 15 <exit_code range=":-1" err_level="fatal"/>
16 </stdio> 16 </stdio>
17 17
18 <help> 18 <help>
19 19
20 .. class:: infomark
21
22 **What it does**
23
24 This tool parses variant counts from a special VCF file (normally the output of the **Naive Variant Detector** tool). It counts simple (ACGT) variants, calculates numbers of alleles, and calculates minor allele frequency. It applies filters based on coverage, strand bias, and minor allele frequency cutoffs.
25
26 -----
27
20 .. class:: warningmark 28 .. class:: warningmark
21 29
22 **Note** 30 **Note**
23 31
24 This will only process a special type of VCF file. The VCF must have a special genotype field in the sample columns, giving the number of each type of variant. Also, the variant data **must be stranded**. 32 The VCF must have a certain genotype field in the sample columns, giving the read count of each type of variant. Also, the variant data **must be stranded**. The **Naive Variant Detector** tool produces this type of VCF.
25
26 The Naive Variant Detector tool produces this VCF format, and is the normal upstream tool from this one.
27
28 -----
29
30 .. class:: infomark
31
32 **What it does**
33
34 This tool parses variant counts from a special VCF file (normally the output of the Naive Variant Detector tool). It counts simple (ACGT) variants, calculates numbers of alleles, and calculates minor allele frequency. It applies filters based on coverage, strand bias, and minor allele frequency cutoffs.
35 33
36 ----- 34 -----
37 35
38 .. class:: infomark 36 .. class:: infomark
39 37
52 9. ALLELES - Number of qualifying alleles 50 9. ALLELES - Number of qualifying alleles
53 10. MAJOR - Major allele base 51 10. MAJOR - Major allele base
54 11. MINOR - Minor allele base (2nd most prevalent variant) 52 11. MINOR - Minor allele base (2nd most prevalent variant)
55 12. MINOR.FREQ.PERC. - Frequency of minor allele 53 12. MINOR.FREQ.PERC. - Frequency of minor allele
56 54
57 **Example**:: 55 **Example**
58 SAMPLE CHR POS A C G T CVRG ALLELES MAJOR MINOR MINOR.FREQ.PERC. 56
59 BLOOD_3 chr7 10980 11 88 1 0 100 2 C A 0.11 57 This is the header line, followed by some example data lines. Note that some samples and/or sites will not be included in the output, if they fall below the coverage threshold::
58
59 #SAMPLE CHR POS A C G T CVRG ALLELES MAJOR MINOR MINOR.FREQ.PERC.
60 BLOOD_1 chr20 99 0 101 1 2 104 1 C T 0.01923
61 BLOOD_2 chr20 99 82 44 0 1 127 2 A C 0.34646
62 BLOOD_3 chr20 99 0 110 1 0 111 1 C G 0.009
63 BLOOD_1 chr20 100 3 5 100 0 108 1 G C 0.0463
64 BLOOD_3 chr20 100 1 118 11 0 130 0 C G 0.08462
60 65
61 ----- 66 -----
62 67
63 .. class:: warningmark 68 .. class:: warningmark
64 69
66 71
67 Each line is printed only when the site is covered by the threshold number of reads **on each strand**. If coverage of either strand is below the threshold, the line (sample + site combination) is omitted. 72 Each line is printed only when the site is covered by the threshold number of reads **on each strand**. If coverage of either strand is below the threshold, the line (sample + site combination) is omitted.
68 73
69 **N.B.**: This means the total coverage for each printed site will be at least twice the number you give in the "coverage threshold" option. 74 **N.B.**: This means the total coverage for each printed site will be at least twice the number you give in the "coverage threshold" option.
70 75
71 Also, reads supporting a variant outside the canonical 4 nucleotides will not count towards the coverage requirement. For instance, a site covered by 150 reads on each strand, the majority of which support an indel variant, will not be printed. 76 Also, reads supporting a variant outside the canonical 4 nucleotides will not count towards the coverage requirement. For instance, a site/sample line with 100x coverage, all of which support a deletion variant, will not be printed.
72 77
73 Alleles are only counted in the column 9 tally if they meet or exceed the minor allele frequency threshold. In addition, the alleles passing the threshold on each strand have to match (though not in order). Otherwise, the allele count will be 0. The reported minor allele and minor allele frequency, though, will always be reported. 78 Alleles are only counted (in column 9) if they meet or exceed the minor allele frequency threshold. So a site/sample line with types of variants, 96% A, 3.3% C, and 0.7% G, will count as 2 alleles (at 1% threshold).
74 79
75 However, if there is a tie for the minor allele (between the 2nd and 3rd most common alleles), the minor allele will be reporated as 'N', and the frequency as 0. 80 Strand bias: the alleles passing the threshold on each strand have to match (though not in order). Otherwise, the allele count will be 0. So a site/sample line whose + strand shows 70% A, 27% C, and 3% G, and - strand shows 70% A and 30% C will have an allele count of 0. The minor allele and minor allele frequency, though, will always be reported\*.
76 81
77 ----- 82 But in this version, there is no requirement that the strands show similar allele frequencies, as long as they both pass the threshold.
78 83
79 .. class:: infomark 84 \*One specific case will actually affect the reported minor allele identity and frequency. If there is a tie for the minor allele (between the 2nd and 3rd most common alleles), the minor allele will be reporated as 'N', and the frequency as 0.0.
80
81 **Additional notes**
82
83
84 85
85 </help> 86 </help>
86 87
87 </tool> 88 </tool>