annotate allele-counts.xml @ 8:411adeff1eec draft

Handle "." sample columns, update tests to work with BIAS column.
author nick
date Tue, 23 Aug 2016 02:30:56 -0400
parents a72277535a2c
children 6cc488e11544
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
6
df3b28364cd2 allele-counts.{py,xml}: Add strand bias, documentation updates.
nicksto <nmapsy@gmail.com>
parents: 5
diff changeset
1 <tool id="allele_counts_1" version="1.2" name="Variant Annotator">
5
31361191d2d2 Uploaded tarball.
nick
parents: 4
diff changeset
2 <description> process variant counts</description>
7
a72277535a2c allele-counts.xml: Fix bug causing crash when no seed is given.
nicksto <nmapsy@gmail.com>
parents: 6
diff changeset
3 <command interpreter="python">allele-counts.py -i $input -o $output -f $freq -c $covg $header $stranded $nofilt
a72277535a2c allele-counts.xml: Fix bug causing crash when no seed is given.
nicksto <nmapsy@gmail.com>
parents: 6
diff changeset
4 #if $seed:
a72277535a2c allele-counts.xml: Fix bug causing crash when no seed is given.
nicksto <nmapsy@gmail.com>
parents: 6
diff changeset
5 -r $seed
a72277535a2c allele-counts.xml: Fix bug causing crash when no seed is given.
nicksto <nmapsy@gmail.com>
parents: 6
diff changeset
6 #end if
a72277535a2c allele-counts.xml: Fix bug causing crash when no seed is given.
nicksto <nmapsy@gmail.com>
parents: 6
diff changeset
7 </command>
0
28c40f4b7d2b Uploaded xml description
nick
parents:
diff changeset
8 <inputs>
28c40f4b7d2b Uploaded xml description
nick
parents:
diff changeset
9 <param name="input" type="data" format="vcf" label="Input variants from Naive Variants Detector"/>
6
df3b28364cd2 allele-counts.{py,xml}: Add strand bias, documentation updates.
nicksto <nmapsy@gmail.com>
parents: 5
diff changeset
10 <param name="freq" type="float" value="1.0" min="0" max="100" label="Minor allele frequency threshold" help="in percent"/>
df3b28364cd2 allele-counts.{py,xml}: Add strand bias, documentation updates.
nicksto <nmapsy@gmail.com>
parents: 5
diff changeset
11 <param name="covg" type="integer" value="10" min="0" label="Coverage threshold" help="in reads (per strand)"/>
5
31361191d2d2 Uploaded tarball.
nick
parents: 4
diff changeset
12 <param name="nofilt" type="boolean" truevalue="-n" falsevalue="" checked="False" label="Do not filter sites or alleles" />
31361191d2d2 Uploaded tarball.
nick
parents: 4
diff changeset
13 <param name="stranded" type="boolean" truevalue="-s" falsevalue="" checked="False" label="Output stranded base counts" />
3
933a9435939c Current xml
nick
parents: 0
diff changeset
14 <param name="header" type="boolean" truevalue="-H" falsevalue="" checked="True" label="Write header line" />
6
df3b28364cd2 allele-counts.{py,xml}: Add strand bias, documentation updates.
nicksto <nmapsy@gmail.com>
parents: 5
diff changeset
15 <param name="seed" type="text" value="" label="PRNG seed" />
0
28c40f4b7d2b Uploaded xml description
nick
parents:
diff changeset
16 </inputs>
28c40f4b7d2b Uploaded xml description
nick
parents:
diff changeset
17 <outputs>
28c40f4b7d2b Uploaded xml description
nick
parents:
diff changeset
18 <data name="output" format="tabular"/>
28c40f4b7d2b Uploaded xml description
nick
parents:
diff changeset
19 </outputs>
28c40f4b7d2b Uploaded xml description
nick
parents:
diff changeset
20 <stdio>
28c40f4b7d2b Uploaded xml description
nick
parents:
diff changeset
21 <exit_code range="1:" err_level="fatal"/>
28c40f4b7d2b Uploaded xml description
nick
parents:
diff changeset
22 <exit_code range=":-1" err_level="fatal"/>
28c40f4b7d2b Uploaded xml description
nick
parents:
diff changeset
23 </stdio>
28c40f4b7d2b Uploaded xml description
nick
parents:
diff changeset
24
6
df3b28364cd2 allele-counts.{py,xml}: Add strand bias, documentation updates.
nicksto <nmapsy@gmail.com>
parents: 5
diff changeset
25 <tests>
df3b28364cd2 allele-counts.{py,xml}: Add strand bias, documentation updates.
nicksto <nmapsy@gmail.com>
parents: 5
diff changeset
26 <test>
df3b28364cd2 allele-counts.{py,xml}: Add strand bias, documentation updates.
nicksto <nmapsy@gmail.com>
parents: 5
diff changeset
27 <param name="input" value="tests/artificial.vcf.in" />
df3b28364cd2 allele-counts.{py,xml}: Add strand bias, documentation updates.
nicksto <nmapsy@gmail.com>
parents: 5
diff changeset
28 <param name="freq" value="10" />
df3b28364cd2 allele-counts.{py,xml}: Add strand bias, documentation updates.
nicksto <nmapsy@gmail.com>
parents: 5
diff changeset
29 <param name="covg" value="10" />
df3b28364cd2 allele-counts.{py,xml}: Add strand bias, documentation updates.
nicksto <nmapsy@gmail.com>
parents: 5
diff changeset
30 <param name="seed" value="1" />
df3b28364cd2 allele-counts.{py,xml}: Add strand bias, documentation updates.
nicksto <nmapsy@gmail.com>
parents: 5
diff changeset
31 <output name="output" file="tests/artificial.csv.out" />
df3b28364cd2 allele-counts.{py,xml}: Add strand bias, documentation updates.
nicksto <nmapsy@gmail.com>
parents: 5
diff changeset
32 </test>
df3b28364cd2 allele-counts.{py,xml}: Add strand bias, documentation updates.
nicksto <nmapsy@gmail.com>
parents: 5
diff changeset
33 </tests>
df3b28364cd2 allele-counts.{py,xml}: Add strand bias, documentation updates.
nicksto <nmapsy@gmail.com>
parents: 5
diff changeset
34
0
28c40f4b7d2b Uploaded xml description
nick
parents:
diff changeset
35 <help>
3
933a9435939c Current xml
nick
parents: 0
diff changeset
36
4
898eb3daab43 Complete documentation
nick
parents: 3
diff changeset
37 .. class:: infomark
898eb3daab43 Complete documentation
nick
parents: 3
diff changeset
38
898eb3daab43 Complete documentation
nick
parents: 3
diff changeset
39 **What it does**
898eb3daab43 Complete documentation
nick
parents: 3
diff changeset
40
5
31361191d2d2 Uploaded tarball.
nick
parents: 4
diff changeset
41 This tool parses variant counts from a special VCF file. It counts simple variants, calculates numbers of alleles, and calculates minor allele frequency. It can apply filters based on coverage, strand bias, and minor allele frequency cutoffs.
4
898eb3daab43 Complete documentation
nick
parents: 3
diff changeset
42
898eb3daab43 Complete documentation
nick
parents: 3
diff changeset
43 -----
898eb3daab43 Complete documentation
nick
parents: 3
diff changeset
44
5
31361191d2d2 Uploaded tarball.
nick
parents: 4
diff changeset
45 .. class:: infomark
31361191d2d2 Uploaded tarball.
nick
parents: 4
diff changeset
46
31361191d2d2 Uploaded tarball.
nick
parents: 4
diff changeset
47 **Input Format**
31361191d2d2 Uploaded tarball.
nick
parents: 4
diff changeset
48
3
933a9435939c Current xml
nick
parents: 0
diff changeset
49 .. class:: warningmark
933a9435939c Current xml
nick
parents: 0
diff changeset
50
5
31361191d2d2 Uploaded tarball.
nick
parents: 4
diff changeset
51 **Note:** variants that are not A/C/G/T SNVs will be ignored!
3
933a9435939c Current xml
nick
parents: 0
diff changeset
52
5
31361191d2d2 Uploaded tarball.
nick
parents: 4
diff changeset
53 The input VCF should be like the output of the **Naive Variant Detector** tool (using the stranded option). The sample column(s) must give the read count for each variant **on each strand**. Below is an example of a valid sample column entry (the important part is after the last colon)::
31361191d2d2 Uploaded tarball.
nick
parents: 4
diff changeset
54
31361191d2d2 Uploaded tarball.
nick
parents: 4
diff changeset
55 0/0:1:0.02:+T=27,+G=1,-T=22,
3
933a9435939c Current xml
nick
parents: 0
diff changeset
56
933a9435939c Current xml
nick
parents: 0
diff changeset
57 -----
933a9435939c Current xml
nick
parents: 0
diff changeset
58
933a9435939c Current xml
nick
parents: 0
diff changeset
59 .. class:: infomark
933a9435939c Current xml
nick
parents: 0
diff changeset
60
5
31361191d2d2 Uploaded tarball.
nick
parents: 4
diff changeset
61 **Output**
3
933a9435939c Current xml
nick
parents: 0
diff changeset
62
6
df3b28364cd2 allele-counts.{py,xml}: Add strand bias, documentation updates.
nicksto <nmapsy@gmail.com>
parents: 5
diff changeset
63 Each row represents one site in one sample. For **unstranded** output, 13 fields give information about that site::
0
28c40f4b7d2b Uploaded xml description
nick
parents:
diff changeset
64
5
31361191d2d2 Uploaded tarball.
nick
parents: 4
diff changeset
65 1. SAMPLE - Sample name (from VCF sample column labels)
3
933a9435939c Current xml
nick
parents: 0
diff changeset
66 2. CHR - Chromosome of the site
933a9435939c Current xml
nick
parents: 0
diff changeset
67 3. POS - Chromosomal coordinate of the site
933a9435939c Current xml
nick
parents: 0
diff changeset
68 4. A - Number of reads supporting an 'A'
5
31361191d2d2 Uploaded tarball.
nick
parents: 4
diff changeset
69 5. C - 'C' reads
31361191d2d2 Uploaded tarball.
nick
parents: 4
diff changeset
70 6. G - 'G' reads
31361191d2d2 Uploaded tarball.
nick
parents: 4
diff changeset
71 7. T - 'T' reads
3
933a9435939c Current xml
nick
parents: 0
diff changeset
72 8. CVRG - Total (number of reads supporting one of the four bases above)
933a9435939c Current xml
nick
parents: 0
diff changeset
73 9. ALLELES - Number of qualifying alleles
5
31361191d2d2 Uploaded tarball.
nick
parents: 4
diff changeset
74 10. MAJOR - Major allele
31361191d2d2 Uploaded tarball.
nick
parents: 4
diff changeset
75 11. MINOR - Minor allele (2nd most prevalent variant)
6
df3b28364cd2 allele-counts.{py,xml}: Add strand bias, documentation updates.
nicksto <nmapsy@gmail.com>
parents: 5
diff changeset
76 12. MAF - Frequency of minor allele
df3b28364cd2 allele-counts.{py,xml}: Add strand bias, documentation updates.
nicksto <nmapsy@gmail.com>
parents: 5
diff changeset
77 13. BIAS - Strand bias measure
3
933a9435939c Current xml
nick
parents: 0
diff changeset
78
5
31361191d2d2 Uploaded tarball.
nick
parents: 4
diff changeset
79 For stranded output, instead of using 4 columns to report read counts per base, 8 are used to report the stranded counts per base::
31361191d2d2 Uploaded tarball.
nick
parents: 4
diff changeset
80
6
df3b28364cd2 allele-counts.{py,xml}: Add strand bias, documentation updates.
nicksto <nmapsy@gmail.com>
parents: 5
diff changeset
81 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
df3b28364cd2 allele-counts.{py,xml}: Add strand bias, documentation updates.
nicksto <nmapsy@gmail.com>
parents: 5
diff changeset
82 SAMPLE CHR POS +A +C +G +T -A -C -G -T CVRG ALLELES MAJOR MINOR MAF BIAS
5
31361191d2d2 Uploaded tarball.
nick
parents: 4
diff changeset
83
4
898eb3daab43 Complete documentation
nick
parents: 3
diff changeset
84 **Example**
898eb3daab43 Complete documentation
nick
parents: 3
diff changeset
85
5
31361191d2d2 Uploaded tarball.
nick
parents: 4
diff changeset
86 Below is a header line, followed by some example data lines. Since the input contained three samples, the data for each site is reported on three consecutive lines. However, if a sample fell below the coverage threshold at that site, the line will be omitted::
4
898eb3daab43 Complete documentation
nick
parents: 3
diff changeset
87
6
df3b28364cd2 allele-counts.{py,xml}: Add strand bias, documentation updates.
nicksto <nmapsy@gmail.com>
parents: 5
diff changeset
88 #SAMPLE CHR POS A C G T CVRG ALLELES MAJOR MINOR MAF BIAS
df3b28364cd2 allele-counts.{py,xml}: Add strand bias, documentation updates.
nicksto <nmapsy@gmail.com>
parents: 5
diff changeset
89 BLOOD_1 chr20 99 0 101 1 2 104 1 C T 0.01923 0.33657
df3b28364cd2 allele-counts.{py,xml}: Add strand bias, documentation updates.
nicksto <nmapsy@gmail.com>
parents: 5
diff changeset
90 BLOOD_2 chr20 99 82 44 0 1 127 2 A C 0.34646 0.07823
df3b28364cd2 allele-counts.{py,xml}: Add strand bias, documentation updates.
nicksto <nmapsy@gmail.com>
parents: 5
diff changeset
91 BLOOD_3 chr20 99 0 110 1 0 111 1 C G 0.009 1.00909
df3b28364cd2 allele-counts.{py,xml}: Add strand bias, documentation updates.
nicksto <nmapsy@gmail.com>
parents: 5
diff changeset
92 BLOOD_1 chr20 100 3 5 100 0 108 1 G C 0.0463 0.15986
df3b28364cd2 allele-counts.{py,xml}: Add strand bias, documentation updates.
nicksto <nmapsy@gmail.com>
parents: 5
diff changeset
93 BLOOD_3 chr20 100 1 118 11 0 130 0 C G 0.08462 0.04154
3
933a9435939c Current xml
nick
parents: 0
diff changeset
94
933a9435939c Current xml
nick
parents: 0
diff changeset
95 -----
933a9435939c Current xml
nick
parents: 0
diff changeset
96
933a9435939c Current xml
nick
parents: 0
diff changeset
97 .. class:: warningmark
933a9435939c Current xml
nick
parents: 0
diff changeset
98
933a9435939c Current xml
nick
parents: 0
diff changeset
99 **Site printing and allele tallying requirements**
933a9435939c Current xml
nick
parents: 0
diff changeset
100
5
31361191d2d2 Uploaded tarball.
nick
parents: 4
diff changeset
101 Coverage threshold:
3
933a9435939c Current xml
nick
parents: 0
diff changeset
102
5
31361191d2d2 Uploaded tarball.
nick
parents: 4
diff changeset
103 If a coverage threshold is used, the number of reads **on each strand** must be at or above the threshold. If either strand is below the threshold, the line will be omitted. **N.B.** this means the total coverage for each printed site will be at least twice the number you give in the "coverage threshold" option. Also, since only simple variants are counted, a site with 100 reads, all supporting a deletion variant, would not be printed.
3
933a9435939c Current xml
nick
parents: 0
diff changeset
104
5
31361191d2d2 Uploaded tarball.
nick
parents: 4
diff changeset
105 Frequency threshold:
3
933a9435939c Current xml
nick
parents: 0
diff changeset
106
5
31361191d2d2 Uploaded tarball.
nick
parents: 4
diff changeset
107 If a frequency threshold is used, alleles are only counted (in the ALLELES column) if they meet or exceed this minor allele frequency threshold.
3
933a9435939c Current xml
nick
parents: 0
diff changeset
108
5
31361191d2d2 Uploaded tarball.
nick
parents: 4
diff changeset
109 Strand bias:
3
933a9435939c Current xml
nick
parents: 0
diff changeset
110
5
31361191d2d2 Uploaded tarball.
nick
parents: 4
diff changeset
111 The alleles passing the threshold on each strand must match (though not in order), or the allele count will be 0. So a site with A, C, G on the plus strand and A, G on the minus strand will get an allele count of zero, though the (strand-independent) major allele, minor allele, and minor allele frequency will still be reported. If there is a tie for the minor allele, one will be randomly chosen.
3
933a9435939c Current xml
nick
parents: 0
diff changeset
112
6
df3b28364cd2 allele-counts.{py,xml}: Add strand bias, documentation updates.
nicksto <nmapsy@gmail.com>
parents: 5
diff changeset
113 Additionally, a measure of strand bias is given in the last column. This is calculated using the method of Guo et al., 2012. A value of "." is given when there is no valid result of the calculation due to a zero denominator. This occurs when there are no reads on one of the strands, or when there is no minor allele.
df3b28364cd2 allele-counts.{py,xml}: Add strand bias, documentation updates.
nicksto <nmapsy@gmail.com>
parents: 5
diff changeset
114
0
28c40f4b7d2b Uploaded xml description
nick
parents:
diff changeset
115 </help>
28c40f4b7d2b Uploaded xml description
nick
parents:
diff changeset
116
6
df3b28364cd2 allele-counts.{py,xml}: Add strand bias, documentation updates.
nicksto <nmapsy@gmail.com>
parents: 5
diff changeset
117 </tool>