Mercurial > repos > nick > allele_counts
comparison allele-counts.xml @ 6:df3b28364cd2
allele-counts.{py,xml}: Add strand bias, documentation updates.
author | nicksto <nmapsy@gmail.com> |
---|---|
date | Wed, 09 Dec 2015 11:20:51 -0500 |
parents | 31361191d2d2 |
children | a72277535a2c |
comparison
equal
deleted
inserted
replaced
5:31361191d2d2 | 6:df3b28364cd2 |
---|---|
1 <tool id="allele_counts_1" version="1.1" name="Variant Annotator"> | 1 <tool id="allele_counts_1" version="1.2" name="Variant Annotator"> |
2 <description> process variant counts</description> | 2 <description> process variant counts</description> |
3 <command interpreter="python">allele-counts.py -i $input -o $output -f $freq -c $covg $header $stranded $nofilt</command> | 3 <command interpreter="python">allele-counts.py -i $input -o $output -f $freq -c $covg $header $stranded $nofilt -r $seed</command> |
4 <inputs> | 4 <inputs> |
5 <param name="input" type="data" format="vcf" label="Input variants from Naive Variants Detector"/> | 5 <param name="input" type="data" format="vcf" label="Input variants from Naive Variants Detector"/> |
6 <param name="freq" type="float" value="1.0" min="0" max="100" label="Minor allele frequency threshold (in percent)"/> | 6 <param name="freq" type="float" value="1.0" min="0" max="100" label="Minor allele frequency threshold" help="in percent"/> |
7 <param name="covg" type="integer" value="10" min="0" label="Coverage threshold (in reads per strand)"/> | 7 <param name="covg" type="integer" value="10" min="0" label="Coverage threshold" help="in reads (per strand)"/> |
8 <param name="nofilt" type="boolean" truevalue="-n" falsevalue="" checked="False" label="Do not filter sites or alleles" /> | 8 <param name="nofilt" type="boolean" truevalue="-n" falsevalue="" checked="False" label="Do not filter sites or alleles" /> |
9 <param name="stranded" type="boolean" truevalue="-s" falsevalue="" checked="False" label="Output stranded base counts" /> | 9 <param name="stranded" type="boolean" truevalue="-s" falsevalue="" checked="False" label="Output stranded base counts" /> |
10 <param name="header" type="boolean" truevalue="-H" falsevalue="" checked="True" label="Write header line" /> | 10 <param name="header" type="boolean" truevalue="-H" falsevalue="" checked="True" label="Write header line" /> |
11 <param name="seed" type="text" value="" label="PRNG seed" /> | |
11 </inputs> | 12 </inputs> |
12 <outputs> | 13 <outputs> |
13 <data name="output" format="tabular"/> | 14 <data name="output" format="tabular"/> |
14 </outputs> | 15 </outputs> |
15 <stdio> | 16 <stdio> |
16 <exit_code range="1:" err_level="fatal"/> | 17 <exit_code range="1:" err_level="fatal"/> |
17 <exit_code range=":-1" err_level="fatal"/> | 18 <exit_code range=":-1" err_level="fatal"/> |
18 </stdio> | 19 </stdio> |
20 | |
21 <tests> | |
22 <test> | |
23 <param name="input" value="tests/artificial.vcf.in" /> | |
24 <param name="freq" value="10" /> | |
25 <param name="covg" value="10" /> | |
26 <param name="seed" value="1" /> | |
27 <output name="output" file="tests/artificial.csv.out" /> | |
28 </test> | |
29 </tests> | |
19 | 30 |
20 <help> | 31 <help> |
21 | 32 |
22 .. class:: infomark | 33 .. class:: infomark |
23 | 34 |
43 | 54 |
44 .. class:: infomark | 55 .. class:: infomark |
45 | 56 |
46 **Output** | 57 **Output** |
47 | 58 |
48 Each row represents one site in one sample. For unstranded output, 12 fields give information about that site:: | 59 Each row represents one site in one sample. For **unstranded** output, 13 fields give information about that site:: |
49 | 60 |
50 1. SAMPLE - Sample name (from VCF sample column labels) | 61 1. SAMPLE - Sample name (from VCF sample column labels) |
51 2. CHR - Chromosome of the site | 62 2. CHR - Chromosome of the site |
52 3. POS - Chromosomal coordinate of the site | 63 3. POS - Chromosomal coordinate of the site |
53 4. A - Number of reads supporting an 'A' | 64 4. A - Number of reads supporting an 'A' |
56 7. T - 'T' reads | 67 7. T - 'T' reads |
57 8. CVRG - Total (number of reads supporting one of the four bases above) | 68 8. CVRG - Total (number of reads supporting one of the four bases above) |
58 9. ALLELES - Number of qualifying alleles | 69 9. ALLELES - Number of qualifying alleles |
59 10. MAJOR - Major allele | 70 10. MAJOR - Major allele |
60 11. MINOR - Minor allele (2nd most prevalent variant) | 71 11. MINOR - Minor allele (2nd most prevalent variant) |
61 12. MINOR.FREQ.PERC. - Frequency of minor allele | 72 12. MAF - Frequency of minor allele |
73 13. BIAS - Strand bias measure | |
62 | 74 |
63 For stranded output, instead of using 4 columns to report read counts per base, 8 are used to report the stranded counts per base:: | 75 For stranded output, instead of using 4 columns to report read counts per base, 8 are used to report the stranded counts per base:: |
64 | 76 |
65 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | 77 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
66 SAMPLE CHR POS +A +C +G +T -A -C -G -T CVRG ALLELES MAJOR MINOR MINOR.FREQ.PERC. | 78 SAMPLE CHR POS +A +C +G +T -A -C -G -T CVRG ALLELES MAJOR MINOR MAF BIAS |
67 | 79 |
68 **Example** | 80 **Example** |
69 | 81 |
70 Below is a header line, followed by some example data lines. Since the input contained three samples, the data for each site is reported on three consecutive lines. However, if a sample fell below the coverage threshold at that site, the line will be omitted:: | 82 Below is a header line, followed by some example data lines. Since the input contained three samples, the data for each site is reported on three consecutive lines. However, if a sample fell below the coverage threshold at that site, the line will be omitted:: |
71 | 83 |
72 #SAMPLE CHR POS A C G T CVRG ALLELES MAJOR MINOR MINOR.FREQ.PERC. | 84 #SAMPLE CHR POS A C G T CVRG ALLELES MAJOR MINOR MAF BIAS |
73 BLOOD_1 chr20 99 0 101 1 2 104 1 C T 0.01923 | 85 BLOOD_1 chr20 99 0 101 1 2 104 1 C T 0.01923 0.33657 |
74 BLOOD_2 chr20 99 82 44 0 1 127 2 A C 0.34646 | 86 BLOOD_2 chr20 99 82 44 0 1 127 2 A C 0.34646 0.07823 |
75 BLOOD_3 chr20 99 0 110 1 0 111 1 C G 0.009 | 87 BLOOD_3 chr20 99 0 110 1 0 111 1 C G 0.009 1.00909 |
76 BLOOD_1 chr20 100 3 5 100 0 108 1 G C 0.0463 | 88 BLOOD_1 chr20 100 3 5 100 0 108 1 G C 0.0463 0.15986 |
77 BLOOD_3 chr20 100 1 118 11 0 130 0 C G 0.08462 | 89 BLOOD_3 chr20 100 1 118 11 0 130 0 C G 0.08462 0.04154 |
78 | 90 |
79 ----- | 91 ----- |
80 | 92 |
81 .. class:: warningmark | 93 .. class:: warningmark |
82 | 94 |
92 | 104 |
93 Strand bias: | 105 Strand bias: |
94 | 106 |
95 The alleles passing the threshold on each strand must match (though not in order), or the allele count will be 0. So a site with A, C, G on the plus strand and A, G on the minus strand will get an allele count of zero, though the (strand-independent) major allele, minor allele, and minor allele frequency will still be reported. If there is a tie for the minor allele, one will be randomly chosen. | 107 The alleles passing the threshold on each strand must match (though not in order), or the allele count will be 0. So a site with A, C, G on the plus strand and A, G on the minus strand will get an allele count of zero, though the (strand-independent) major allele, minor allele, and minor allele frequency will still be reported. If there is a tie for the minor allele, one will be randomly chosen. |
96 | 108 |
109 Additionally, a measure of strand bias is given in the last column. This is calculated using the method of Guo et al., 2012. A value of "." is given when there is no valid result of the calculation due to a zero denominator. This occurs when there are no reads on one of the strands, or when there is no minor allele. | |
110 | |
97 </help> | 111 </help> |
98 | 112 |
99 </tool> | 113 </tool> |