Mercurial > repos > nick > allele_counts
diff allele-counts.xml @ 6:df3b28364cd2
allele-counts.{py,xml}: Add strand bias, documentation updates.
author | nicksto <nmapsy@gmail.com> |
---|---|
date | Wed, 09 Dec 2015 11:20:51 -0500 |
parents | 31361191d2d2 |
children | a72277535a2c |
line wrap: on
line diff
--- a/allele-counts.xml Thu Sep 12 11:34:23 2013 -0400 +++ b/allele-counts.xml Wed Dec 09 11:20:51 2015 -0500 @@ -1,13 +1,14 @@ -<tool id="allele_counts_1" version="1.1" name="Variant Annotator"> +<tool id="allele_counts_1" version="1.2" name="Variant Annotator"> <description> process variant counts</description> - <command interpreter="python">allele-counts.py -i $input -o $output -f $freq -c $covg $header $stranded $nofilt</command> + <command interpreter="python">allele-counts.py -i $input -o $output -f $freq -c $covg $header $stranded $nofilt -r $seed</command> <inputs> <param name="input" type="data" format="vcf" label="Input variants from Naive Variants Detector"/> - <param name="freq" type="float" value="1.0" min="0" max="100" label="Minor allele frequency threshold (in percent)"/> - <param name="covg" type="integer" value="10" min="0" label="Coverage threshold (in reads per strand)"/> + <param name="freq" type="float" value="1.0" min="0" max="100" label="Minor allele frequency threshold" help="in percent"/> + <param name="covg" type="integer" value="10" min="0" label="Coverage threshold" help="in reads (per strand)"/> <param name="nofilt" type="boolean" truevalue="-n" falsevalue="" checked="False" label="Do not filter sites or alleles" /> <param name="stranded" type="boolean" truevalue="-s" falsevalue="" checked="False" label="Output stranded base counts" /> <param name="header" type="boolean" truevalue="-H" falsevalue="" checked="True" label="Write header line" /> + <param name="seed" type="text" value="" label="PRNG seed" /> </inputs> <outputs> <data name="output" format="tabular"/> @@ -17,6 +18,16 @@ <exit_code range=":-1" err_level="fatal"/> </stdio> + <tests> + <test> + <param name="input" value="tests/artificial.vcf.in" /> + <param name="freq" value="10" /> + <param name="covg" value="10" /> + <param name="seed" value="1" /> + <output name="output" file="tests/artificial.csv.out" /> + </test> + </tests> + <help> .. class:: infomark @@ -45,7 +56,7 @@ **Output** -Each row represents one site in one sample. For unstranded output, 12 fields give information about that site:: +Each row represents one site in one sample. For **unstranded** output, 13 fields give information about that site:: 1. SAMPLE - Sample name (from VCF sample column labels) 2. CHR - Chromosome of the site @@ -58,23 +69,24 @@ 9. ALLELES - Number of qualifying alleles 10. MAJOR - Major allele 11. MINOR - Minor allele (2nd most prevalent variant) - 12. MINOR.FREQ.PERC. - Frequency of minor allele + 12. MAF - Frequency of minor allele + 13. BIAS - Strand bias measure For stranded output, instead of using 4 columns to report read counts per base, 8 are used to report the stranded counts per base:: - 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 - SAMPLE CHR POS +A +C +G +T -A -C -G -T CVRG ALLELES MAJOR MINOR MINOR.FREQ.PERC. + 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 + SAMPLE CHR POS +A +C +G +T -A -C -G -T CVRG ALLELES MAJOR MINOR MAF BIAS **Example** Below is a header line, followed by some example data lines. Since the input contained three samples, the data for each site is reported on three consecutive lines. However, if a sample fell below the coverage threshold at that site, the line will be omitted:: - #SAMPLE CHR POS A C G T CVRG ALLELES MAJOR MINOR MINOR.FREQ.PERC. - BLOOD_1 chr20 99 0 101 1 2 104 1 C T 0.01923 - BLOOD_2 chr20 99 82 44 0 1 127 2 A C 0.34646 - BLOOD_3 chr20 99 0 110 1 0 111 1 C G 0.009 - BLOOD_1 chr20 100 3 5 100 0 108 1 G C 0.0463 - BLOOD_3 chr20 100 1 118 11 0 130 0 C G 0.08462 + #SAMPLE CHR POS A C G T CVRG ALLELES MAJOR MINOR MAF BIAS + BLOOD_1 chr20 99 0 101 1 2 104 1 C T 0.01923 0.33657 + BLOOD_2 chr20 99 82 44 0 1 127 2 A C 0.34646 0.07823 + BLOOD_3 chr20 99 0 110 1 0 111 1 C G 0.009 1.00909 + BLOOD_1 chr20 100 3 5 100 0 108 1 G C 0.0463 0.15986 + BLOOD_3 chr20 100 1 118 11 0 130 0 C G 0.08462 0.04154 ----- @@ -94,6 +106,8 @@ The alleles passing the threshold on each strand must match (though not in order), or the allele count will be 0. So a site with A, C, G on the plus strand and A, G on the minus strand will get an allele count of zero, though the (strand-independent) major allele, minor allele, and minor allele frequency will still be reported. If there is a tie for the minor allele, one will be randomly chosen. +Additionally, a measure of strand bias is given in the last column. This is calculated using the method of Guo et al., 2012. A value of "." is given when there is no valid result of the calculation due to a zero denominator. This occurs when there are no reads on one of the strands, or when there is no minor allele. + </help> -</tool> \ No newline at end of file +</tool>