0
|
1 This repository contains the **Naive Variant Caller** tool.
|
|
2
|
|
3 ------
|
|
4
|
|
5 **What it does**
|
|
6
|
|
7 This tool is a naive variant caller that processes aligned sequencing reads from the BAM format and produces a VCF file containing per position variant calls. This tool allows multiple BAM files to be provided as input and utilizes read group information to make calls for individual samples.
|
|
8
|
|
9 User configurable options allow filtering reads that do not pass mapping or base quality thresholds and minimum per base read depth; user's can also specify the ploidy and whether to consider each strand separately.
|
|
10
|
|
11 In addition to calling alternate alleles based upon simple ratios of nucleotides at a position, per base nucleotide counts are also provided. A custom tag, NC, is used within the Genotype fields. The NC field is a comma-separated listing of nucleotide counts in the form of <nucleotide>=<count>, where a plus or minus character is prepended to indicate strand, if the strandedness option was specified.
|
|
12
|
|
13
|
|
14 ------
|
|
15
|
|
16 **Inputs**
|
|
17
|
|
18 Accepts one or more BAM input files and a reference genome from the built-in list or from a FASTA file in your history.
|
|
19
|
|
20
|
|
21 **Outputs**
|
|
22
|
|
23 The output is in VCF format.
|
|
24
|
|
25 Example VCF output line, without reporting by strand:
|
|
26 ``chrM 16029 . T G,A,C . . AC=15,9,5;AF=0.00155311658729,0.000931869952371,0.000517705529095 GT:AC:AF:NC 0/0:15,9,5:0.00155311658729,0.000931869952371,0.000517705529095:A=9,C=5,T=9629,G=15,``
|
|
27
|
|
28 Example VCF output line, when reporting by strand:
|
|
29 ``chrM 16029 . T G,A,C . . AC=15,9,5;AF=0.00155311658729,0.000931869952371,0.000517705529095 GT:AC:AF:NC 0/0:15,9,5:0.00155311658729,0.000931869952371,0.000517705529095:+T=3972,-A=9,-C=5,-T=5657,-G=15,``
|
|
30
|
|
31 **Options**
|
|
32
|
|
33 Reference Genome:
|
|
34
|
|
35 Ensure that you have selected the correct reference genome, either from the list of built-in genomes or by selecting the corresponding FASTA file from your history.
|
|
36
|
|
37 Restrict to regions:
|
|
38
|
|
39 You can specify any number of regions on which you would like to receive results. You can specify just a chromosome name, or a chromosome name and start postion, or a chromosome name and start and end position for the set of desired regions.
|
|
40
|
|
41 Minimum number of reads needed to consider a REF/ALT:
|
|
42
|
|
43 This value declares the minimum number of reads containing a particular base at each position in order to list and use said allele in genotyping calls. Default is 0.
|
|
44
|
|
45 Minimum base quality:
|
|
46
|
|
47 The minimum base quality score needed for the position in a read to be used for nucleotide counts and genotyping. Default is no filter.
|
|
48
|
|
49 Minimum mapping quality:
|
|
50
|
|
51 The minimum mapping quality score needed to consider a read for nucleotide counts and genotyping. Default is no filter.
|
|
52
|
|
53 Ploidy:
|
|
54
|
|
55 The number of genotype calls to make at each reported position.
|
|
56
|
|
57 Only write out positions with with possible alternate alleles:
|
|
58
|
|
59 When set, only positions which have at least one non-reference nucleotide which passes declare filters will be present in the output.
|
|
60
|
|
61 Report counts by strand:
|
|
62
|
|
63 When set, nucleotide counts (NC) will be reported in reference to the aligned read's source strand. Reported as: <strand><BASE>=<COUNT>.
|
|
64
|
|
65 Choose the dtype to use for storing coverage information:
|
|
66
|
|
67 This controls the maximum depth value for each nucleotide/position/strand (when specified). Smaller values require the least amount of memory, but have smaller maximal limits.
|
|
68
|
|
69 +--------+----------------------------+
|
|
70 | name | maximum coverage value |
|
|
71 +========+============================+
|
|
72 | uint8 | 255 |
|
|
73 +--------+----------------------------+
|
|
74 | uint16 | 65,535 |
|
|
75 +--------+----------------------------+
|
|
76 | uint32 | 4,294,967,295 |
|
|
77 +--------+----------------------------+
|
|
78 | uint64 | 18,446,744,073,709,551,615 |
|
|
79 +--------+----------------------------+
|
|
80
|
|
81
|
|
82 ------
|
|
83
|
|
84 **Citation**
|
|
85
|
|
86 If you use this tool, please cite Blankenberg D, et al. *In preparation.*
|