Mercurial > repos > nick > allele_counts
view allele-counts.xml @ 4:898eb3daab43
Complete documentation
author | nick |
---|---|
date | Tue, 04 Jun 2013 00:16:29 -0400 |
parents | 933a9435939c |
children | 31361191d2d2 |
line wrap: on
line source
<tool id="allele_counts_1" version="1.0" name="Count alleles"> <description>and minor allele frequencies</description> <command interpreter="python">allele-counts.py -i $input -o $output -f $freq -c $covg $header</command> <inputs> <param name="input" type="data" format="vcf" label="Input variants from Naive Variants Detector"/> <param name="freq" type="float" value="1.0" min="0" max="100" label="Minor allele frequency threshold (in percent)"/> <param name="covg" type="integer" value="10" min="0" label="Coverage threshold (per strand)"/> <param name="header" type="boolean" truevalue="-H" falsevalue="" checked="True" label="Write header line" /> </inputs> <outputs> <data name="output" format="tabular"/> </outputs> <stdio> <exit_code range="1:" err_level="fatal"/> <exit_code range=":-1" err_level="fatal"/> </stdio> <help> .. class:: infomark **What it does** This tool parses variant counts from a special VCF file (normally the output of the **Naive Variant Detector** tool). It counts simple (ACGT) variants, calculates numbers of alleles, and calculates minor allele frequency. It applies filters based on coverage, strand bias, and minor allele frequency cutoffs. ----- .. class:: warningmark **Note** The VCF must have a certain genotype field in the sample columns, giving the read count of each type of variant. Also, the variant data **must be stranded**. The **Naive Variant Detector** tool produces this type of VCF. ----- .. class:: infomark **Output columns** Each row represents one site in one sample. 12 fields give information about that site:: 1. SAMPLE - Sample names (from VCF sample column labels) 2. CHR - Chromosome of the site 3. POS - Chromosomal coordinate of the site 4. A - Number of reads supporting an 'A' 5. C - ditto, for 'C' 6. G - ditto, for 'G' 7. T - ditto, for 'T' 8. CVRG - Total (number of reads supporting one of the four bases above) 9. ALLELES - Number of qualifying alleles 10. MAJOR - Major allele base 11. MINOR - Minor allele base (2nd most prevalent variant) 12. MINOR.FREQ.PERC. - Frequency of minor allele **Example** This is the header line, followed by some example data lines. Note that some samples and/or sites will not be included in the output, if they fall below the coverage threshold:: #SAMPLE CHR POS A C G T CVRG ALLELES MAJOR MINOR MINOR.FREQ.PERC. BLOOD_1 chr20 99 0 101 1 2 104 1 C T 0.01923 BLOOD_2 chr20 99 82 44 0 1 127 2 A C 0.34646 BLOOD_3 chr20 99 0 110 1 0 111 1 C G 0.009 BLOOD_1 chr20 100 3 5 100 0 108 1 G C 0.0463 BLOOD_3 chr20 100 1 118 11 0 130 0 C G 0.08462 ----- .. class:: warningmark **Site printing and allele tallying requirements** Each line is printed only when the site is covered by the threshold number of reads **on each strand**. If coverage of either strand is below the threshold, the line (sample + site combination) is omitted. **N.B.**: This means the total coverage for each printed site will be at least twice the number you give in the "coverage threshold" option. Also, reads supporting a variant outside the canonical 4 nucleotides will not count towards the coverage requirement. For instance, a site/sample line with 100x coverage, all of which support a deletion variant, will not be printed. Alleles are only counted (in column 9) if they meet or exceed the minor allele frequency threshold. So a site/sample line with types of variants, 96% A, 3.3% C, and 0.7% G, will count as 2 alleles (at 1% threshold). Strand bias: the alleles passing the threshold on each strand have to match (though not in order). Otherwise, the allele count will be 0. So a site/sample line whose + strand shows 70% A, 27% C, and 3% G, and - strand shows 70% A and 30% C will have an allele count of 0. The minor allele and minor allele frequency, though, will always be reported\*. But in this version, there is no requirement that the strands show similar allele frequencies, as long as they both pass the threshold. \*One specific case will actually affect the reported minor allele identity and frequency. If there is a tie for the minor allele (between the 2nd and 3rd most common alleles), the minor allele will be reporated as 'N', and the frequency as 0.0. </help> </tool>