Mercurial > repos > blankenberg > naive_variant_caller
annotate README.rst @ 11:8af4e7a4d041
Add upgrade recommended message.
author | Daniel Blankenberg <dan@bx.psu.edu> |
---|---|
date | Thu, 17 Sep 2015 14:22:52 -0400 |
parents | 907b40517289 |
children | 5c852eca82e0 |
rev | line source |
---|---|
0 | 1 This repository contains the **Naive Variant Caller** tool. |
2 | |
3 ------ | |
4 | |
5 **What it does** | |
6 | |
7 This tool is a naive variant caller that processes aligned sequencing reads from the BAM format and produces a VCF file containing per position variant calls. This tool allows multiple BAM files to be provided as input and utilizes read group information to make calls for individual samples. | |
8 | |
9 User configurable options allow filtering reads that do not pass mapping or base quality thresholds and minimum per base read depth; user's can also specify the ploidy and whether to consider each strand separately. | |
10 | |
11 In addition to calling alternate alleles based upon simple ratios of nucleotides at a position, per base nucleotide counts are also provided. A custom tag, NC, is used within the Genotype fields. The NC field is a comma-separated listing of nucleotide counts in the form of <nucleotide>=<count>, where a plus or minus character is prepended to indicate strand, if the strandedness option was specified. | |
12 | |
13 | |
14 ------ | |
15 | |
16 **Inputs** | |
17 | |
18 Accepts one or more BAM input files and a reference genome from the built-in list or from a FASTA file in your history. | |
19 | |
20 | |
21 **Outputs** | |
22 | |
23 The output is in VCF format. | |
24 | |
25 Example VCF output line, without reporting by strand: | |
26 ``chrM 16029 . T G,A,C . . AC=15,9,5;AF=0.00155311658729,0.000931869952371,0.000517705529095 GT:AC:AF:NC 0/0:15,9,5:0.00155311658729,0.000931869952371,0.000517705529095:A=9,C=5,T=9629,G=15,`` | |
27 | |
28 Example VCF output line, when reporting by strand: | |
29 ``chrM 16029 . T G,A,C . . AC=15,9,5;AF=0.00155311658729,0.000931869952371,0.000517705529095 GT:AC:AF:NC 0/0:15,9,5:0.00155311658729,0.000931869952371,0.000517705529095:+T=3972,-A=9,-C=5,-T=5657,-G=15,`` | |
30 | |
31 **Options** | |
32 | |
33 Reference Genome: | |
34 | |
35 Ensure that you have selected the correct reference genome, either from the list of built-in genomes or by selecting the corresponding FASTA file from your history. | |
36 | |
37 Restrict to regions: | |
38 | |
39 You can specify any number of regions on which you would like to receive results. You can specify just a chromosome name, or a chromosome name and start postion, or a chromosome name and start and end position for the set of desired regions. | |
40 | |
41 Minimum number of reads needed to consider a REF/ALT: | |
42 | |
43 This value declares the minimum number of reads containing a particular base at each position in order to list and use said allele in genotyping calls. Default is 0. | |
44 | |
45 Minimum base quality: | |
46 | |
47 The minimum base quality score needed for the position in a read to be used for nucleotide counts and genotyping. Default is no filter. | |
48 | |
49 Minimum mapping quality: | |
50 | |
51 The minimum mapping quality score needed to consider a read for nucleotide counts and genotyping. Default is no filter. | |
52 | |
53 Ploidy: | |
54 | |
55 The number of genotype calls to make at each reported position. | |
56 | |
10
907b40517289
Fix typo ("with with") in readme.
Daniel Blankenberg <dan@bx.psu.edu>
parents:
0
diff
changeset
|
57 Only write out positions with possible alternate alleles: |
0 | 58 |
59 When set, only positions which have at least one non-reference nucleotide which passes declare filters will be present in the output. | |
60 | |
61 Report counts by strand: | |
62 | |
63 When set, nucleotide counts (NC) will be reported in reference to the aligned read's source strand. Reported as: <strand><BASE>=<COUNT>. | |
64 | |
65 Choose the dtype to use for storing coverage information: | |
66 | |
67 This controls the maximum depth value for each nucleotide/position/strand (when specified). Smaller values require the least amount of memory, but have smaller maximal limits. | |
68 | |
69 +--------+----------------------------+ | |
70 | name | maximum coverage value | | |
71 +========+============================+ | |
72 | uint8 | 255 | | |
73 +--------+----------------------------+ | |
74 | uint16 | 65,535 | | |
75 +--------+----------------------------+ | |
76 | uint32 | 4,294,967,295 | | |
77 +--------+----------------------------+ | |
78 | uint64 | 18,446,744,073,709,551,615 | | |
79 +--------+----------------------------+ | |
80 | |
81 | |
82 ------ | |
83 | |
84 **Citation** | |
85 | |
86 If you use this tool, please cite Blankenberg D, et al. *In preparation.* |