comparison README.rst @ 0:0fa83c466e9d

Create naive_variant_caller repository
author blankenberg
date Thu, 29 Aug 2013 09:57:14 -0400
parents
children 907b40517289
comparison
equal deleted inserted replaced
-1:000000000000 0:0fa83c466e9d
1 This repository contains the **Naive Variant Caller** tool.
2
3 ------
4
5 **What it does**
6
7 This tool is a naive variant caller that processes aligned sequencing reads from the BAM format and produces a VCF file containing per position variant calls. This tool allows multiple BAM files to be provided as input and utilizes read group information to make calls for individual samples.
8
9 User configurable options allow filtering reads that do not pass mapping or base quality thresholds and minimum per base read depth; user's can also specify the ploidy and whether to consider each strand separately.
10
11 In addition to calling alternate alleles based upon simple ratios of nucleotides at a position, per base nucleotide counts are also provided. A custom tag, NC, is used within the Genotype fields. The NC field is a comma-separated listing of nucleotide counts in the form of <nucleotide>=<count>, where a plus or minus character is prepended to indicate strand, if the strandedness option was specified.
12
13
14 ------
15
16 **Inputs**
17
18 Accepts one or more BAM input files and a reference genome from the built-in list or from a FASTA file in your history.
19
20
21 **Outputs**
22
23 The output is in VCF format.
24
25 Example VCF output line, without reporting by strand:
26 ``chrM 16029 . T G,A,C . . AC=15,9,5;AF=0.00155311658729,0.000931869952371,0.000517705529095 GT:AC:AF:NC 0/0:15,9,5:0.00155311658729,0.000931869952371,0.000517705529095:A=9,C=5,T=9629,G=15,``
27
28 Example VCF output line, when reporting by strand:
29 ``chrM 16029 . T G,A,C . . AC=15,9,5;AF=0.00155311658729,0.000931869952371,0.000517705529095 GT:AC:AF:NC 0/0:15,9,5:0.00155311658729,0.000931869952371,0.000517705529095:+T=3972,-A=9,-C=5,-T=5657,-G=15,``
30
31 **Options**
32
33 Reference Genome:
34
35 Ensure that you have selected the correct reference genome, either from the list of built-in genomes or by selecting the corresponding FASTA file from your history.
36
37 Restrict to regions:
38
39 You can specify any number of regions on which you would like to receive results. You can specify just a chromosome name, or a chromosome name and start postion, or a chromosome name and start and end position for the set of desired regions.
40
41 Minimum number of reads needed to consider a REF/ALT:
42
43 This value declares the minimum number of reads containing a particular base at each position in order to list and use said allele in genotyping calls. Default is 0.
44
45 Minimum base quality:
46
47 The minimum base quality score needed for the position in a read to be used for nucleotide counts and genotyping. Default is no filter.
48
49 Minimum mapping quality:
50
51 The minimum mapping quality score needed to consider a read for nucleotide counts and genotyping. Default is no filter.
52
53 Ploidy:
54
55 The number of genotype calls to make at each reported position.
56
57 Only write out positions with with possible alternate alleles:
58
59 When set, only positions which have at least one non-reference nucleotide which passes declare filters will be present in the output.
60
61 Report counts by strand:
62
63 When set, nucleotide counts (NC) will be reported in reference to the aligned read's source strand. Reported as: <strand><BASE>=<COUNT>.
64
65 Choose the dtype to use for storing coverage information:
66
67 This controls the maximum depth value for each nucleotide/position/strand (when specified). Smaller values require the least amount of memory, but have smaller maximal limits.
68
69 +--------+----------------------------+
70 | name | maximum coverage value |
71 +========+============================+
72 | uint8 | 255 |
73 +--------+----------------------------+
74 | uint16 | 65,535 |
75 +--------+----------------------------+
76 | uint32 | 4,294,967,295 |
77 +--------+----------------------------+
78 | uint64 | 18,446,744,073,709,551,615 |
79 +--------+----------------------------+
80
81
82 ------
83
84 **Citation**
85
86 If you use this tool, please cite Blankenberg D, et al. *In preparation.*