annotate allele-counts.xml @ 3:933a9435939c

Current xml - includes documentation
author nick
date Fri, 31 May 2013 12:34:28 -0400
parents 28c40f4b7d2b
children 898eb3daab43
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
28c40f4b7d2b Uploaded xml description
nick
parents:
diff changeset
1 <tool id="allele_counts_1" version="1.0" name="Count alleles">
28c40f4b7d2b Uploaded xml description
nick
parents:
diff changeset
2 <description>and minor allele frequencies</description>
28c40f4b7d2b Uploaded xml description
nick
parents:
diff changeset
3 <command interpreter="python">allele-counts.py -i $input -o $output -f $freq -c $covg $header</command>
28c40f4b7d2b Uploaded xml description
nick
parents:
diff changeset
4 <inputs>
28c40f4b7d2b Uploaded xml description
nick
parents:
diff changeset
5 <param name="input" type="data" format="vcf" label="Input variants from Naive Variants Detector"/>
3
933a9435939c Current xml
nick
parents: 0
diff changeset
6 <param name="freq" type="float" value="1.0" min="0" max="100" label="Minor allele frequency threshold (in percent)"/>
0
28c40f4b7d2b Uploaded xml description
nick
parents:
diff changeset
7 <param name="covg" type="integer" value="10" min="0" label="Coverage threshold (per strand)"/>
3
933a9435939c Current xml
nick
parents: 0
diff changeset
8 <param name="header" type="boolean" truevalue="-H" falsevalue="" checked="True" label="Write header line" />
0
28c40f4b7d2b Uploaded xml description
nick
parents:
diff changeset
9 </inputs>
28c40f4b7d2b Uploaded xml description
nick
parents:
diff changeset
10 <outputs>
28c40f4b7d2b Uploaded xml description
nick
parents:
diff changeset
11 <data name="output" format="tabular"/>
28c40f4b7d2b Uploaded xml description
nick
parents:
diff changeset
12 </outputs>
28c40f4b7d2b Uploaded xml description
nick
parents:
diff changeset
13 <stdio>
28c40f4b7d2b Uploaded xml description
nick
parents:
diff changeset
14 <exit_code range="1:" err_level="fatal"/>
28c40f4b7d2b Uploaded xml description
nick
parents:
diff changeset
15 <exit_code range=":-1" err_level="fatal"/>
28c40f4b7d2b Uploaded xml description
nick
parents:
diff changeset
16 </stdio>
28c40f4b7d2b Uploaded xml description
nick
parents:
diff changeset
17
28c40f4b7d2b Uploaded xml description
nick
parents:
diff changeset
18 <help>
3
933a9435939c Current xml
nick
parents: 0
diff changeset
19
933a9435939c Current xml
nick
parents: 0
diff changeset
20 .. class:: warningmark
933a9435939c Current xml
nick
parents: 0
diff changeset
21
933a9435939c Current xml
nick
parents: 0
diff changeset
22 **Note**
933a9435939c Current xml
nick
parents: 0
diff changeset
23
933a9435939c Current xml
nick
parents: 0
diff changeset
24 This will only process a special type of VCF file. The VCF must have a special genotype field in the sample columns, giving the number of each type of variant. Also, the variant data **must be stranded**.
933a9435939c Current xml
nick
parents: 0
diff changeset
25
933a9435939c Current xml
nick
parents: 0
diff changeset
26 The Naive Variant Detector tool produces this VCF format, and is the normal upstream tool from this one.
933a9435939c Current xml
nick
parents: 0
diff changeset
27
933a9435939c Current xml
nick
parents: 0
diff changeset
28 -----
933a9435939c Current xml
nick
parents: 0
diff changeset
29
933a9435939c Current xml
nick
parents: 0
diff changeset
30 .. class:: infomark
933a9435939c Current xml
nick
parents: 0
diff changeset
31
933a9435939c Current xml
nick
parents: 0
diff changeset
32 **What it does**
933a9435939c Current xml
nick
parents: 0
diff changeset
33
933a9435939c Current xml
nick
parents: 0
diff changeset
34 This tool parses variant counts from a special VCF file (normally the output of the Naive Variant Detector tool). It counts simple (ACGT) variants, calculates numbers of alleles, and calculates minor allele frequency. It applies filters based on coverage, strand bias, and minor allele frequency cutoffs.
933a9435939c Current xml
nick
parents: 0
diff changeset
35
933a9435939c Current xml
nick
parents: 0
diff changeset
36 -----
933a9435939c Current xml
nick
parents: 0
diff changeset
37
933a9435939c Current xml
nick
parents: 0
diff changeset
38 .. class:: infomark
933a9435939c Current xml
nick
parents: 0
diff changeset
39
933a9435939c Current xml
nick
parents: 0
diff changeset
40 **Output columns**
933a9435939c Current xml
nick
parents: 0
diff changeset
41
933a9435939c Current xml
nick
parents: 0
diff changeset
42 Each row represents one site in one sample. 12 fields give information about that site::
0
28c40f4b7d2b Uploaded xml description
nick
parents:
diff changeset
43
3
933a9435939c Current xml
nick
parents: 0
diff changeset
44 1. SAMPLE - Sample names (from VCF sample column labels)
933a9435939c Current xml
nick
parents: 0
diff changeset
45 2. CHR - Chromosome of the site
933a9435939c Current xml
nick
parents: 0
diff changeset
46 3. POS - Chromosomal coordinate of the site
933a9435939c Current xml
nick
parents: 0
diff changeset
47 4. A - Number of reads supporting an 'A'
933a9435939c Current xml
nick
parents: 0
diff changeset
48 5. C - ditto, for 'C'
933a9435939c Current xml
nick
parents: 0
diff changeset
49 6. G - ditto, for 'G'
933a9435939c Current xml
nick
parents: 0
diff changeset
50 7. T - ditto, for 'T'
933a9435939c Current xml
nick
parents: 0
diff changeset
51 8. CVRG - Total (number of reads supporting one of the four bases above)
933a9435939c Current xml
nick
parents: 0
diff changeset
52 9. ALLELES - Number of qualifying alleles
933a9435939c Current xml
nick
parents: 0
diff changeset
53 10. MAJOR - Major allele base
933a9435939c Current xml
nick
parents: 0
diff changeset
54 11. MINOR - Minor allele base (2nd most prevalent variant)
933a9435939c Current xml
nick
parents: 0
diff changeset
55 12. MINOR.FREQ.PERC. - Frequency of minor allele
933a9435939c Current xml
nick
parents: 0
diff changeset
56
933a9435939c Current xml
nick
parents: 0
diff changeset
57 **Example**::
933a9435939c Current xml
nick
parents: 0
diff changeset
58 SAMPLE CHR POS A C G T CVRG ALLELES MAJOR MINOR MINOR.FREQ.PERC.
933a9435939c Current xml
nick
parents: 0
diff changeset
59 BLOOD_3 chr7 10980 11 88 1 0 100 2 C A 0.11
933a9435939c Current xml
nick
parents: 0
diff changeset
60
933a9435939c Current xml
nick
parents: 0
diff changeset
61 -----
933a9435939c Current xml
nick
parents: 0
diff changeset
62
933a9435939c Current xml
nick
parents: 0
diff changeset
63 .. class:: warningmark
933a9435939c Current xml
nick
parents: 0
diff changeset
64
933a9435939c Current xml
nick
parents: 0
diff changeset
65 **Site printing and allele tallying requirements**
933a9435939c Current xml
nick
parents: 0
diff changeset
66
933a9435939c Current xml
nick
parents: 0
diff changeset
67 Each line is printed only when the site is covered by the threshold number of reads **on each strand**. If coverage of either strand is below the threshold, the line (sample + site combination) is omitted.
933a9435939c Current xml
nick
parents: 0
diff changeset
68
933a9435939c Current xml
nick
parents: 0
diff changeset
69 **N.B.**: This means the total coverage for each printed site will be at least twice the number you give in the "coverage threshold" option.
933a9435939c Current xml
nick
parents: 0
diff changeset
70
933a9435939c Current xml
nick
parents: 0
diff changeset
71 Also, reads supporting a variant outside the canonical 4 nucleotides will not count towards the coverage requirement. For instance, a site covered by 150 reads on each strand, the majority of which support an indel variant, will not be printed.
933a9435939c Current xml
nick
parents: 0
diff changeset
72
933a9435939c Current xml
nick
parents: 0
diff changeset
73 Alleles are only counted in the column 9 tally if they meet or exceed the minor allele frequency threshold. In addition, the alleles passing the threshold on each strand have to match (though not in order). Otherwise, the allele count will be 0. The reported minor allele and minor allele frequency, though, will always be reported.
933a9435939c Current xml
nick
parents: 0
diff changeset
74
933a9435939c Current xml
nick
parents: 0
diff changeset
75 However, if there is a tie for the minor allele (between the 2nd and 3rd most common alleles), the minor allele will be reporated as 'N', and the frequency as 0.
933a9435939c Current xml
nick
parents: 0
diff changeset
76
933a9435939c Current xml
nick
parents: 0
diff changeset
77 -----
933a9435939c Current xml
nick
parents: 0
diff changeset
78
933a9435939c Current xml
nick
parents: 0
diff changeset
79 .. class:: infomark
933a9435939c Current xml
nick
parents: 0
diff changeset
80
933a9435939c Current xml
nick
parents: 0
diff changeset
81 **Additional notes**
933a9435939c Current xml
nick
parents: 0
diff changeset
82
933a9435939c Current xml
nick
parents: 0
diff changeset
83
933a9435939c Current xml
nick
parents: 0
diff changeset
84
0
28c40f4b7d2b Uploaded xml description
nick
parents:
diff changeset
85 </help>
28c40f4b7d2b Uploaded xml description
nick
parents:
diff changeset
86
28c40f4b7d2b Uploaded xml description
nick
parents:
diff changeset
87 </tool>