annotate pyPRADA_1.2/tools/samtools-0.1.16/bcftools/README @ 0:acc2ca1a3ba4

Uploaded
author siyuan
date Thu, 20 Feb 2014 00:44:58 -0500
parents
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
1 The view command of bcftools calls variants, tests Hardy-Weinberg
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
2 equilibrium (HWE), tests allele balances and estimates allele frequency.
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
3
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
4 This command calls a site as a potential variant if P(ref|D,F) is below
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
5 0.9 (controlled by the -p option), where D is data and F is the prior
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
6 allele frequency spectrum (AFS).
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
7
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
8 The view command performs two types of allele balance tests, both based
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
9 on Fisher's exact test for 2x2 contingency tables with the row variable
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
10 being reference allele or not. In the first table, the column variable
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
11 is strand. Two-tail P-value is taken. We test if variant bases tend to
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
12 come from one strand. In the second table, the column variable is
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
13 whether a base appears in the first or the last 11bp of the read.
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
14 One-tail P-value is taken. We test if variant bases tend to occur
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
15 towards the end of reads, which is usually an indication of
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
16 misalignment.
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
17
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
18 Site allele frequency is estimated in two ways. In the first way, the
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
19 frequency is esimated as \argmax_f P(D|f) under the assumption of
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
20 HWE. Prior AFS is not used. In the second way, the frequency is
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
21 estimated as the posterior expectation of allele counts \sum_k
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
22 kP(k|D,F), dividied by the total number of haplotypes. HWE is not
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
23 assumed, but the estimate depends on the prior AFS. The two estimates
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
24 largely agree when the signal is strong, but may differ greatly on weak
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
25 sites as in this case, the prior plays an important role.
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
26
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
27 To test HWE, we calculate the posterior distribution of genotypes
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
28 (ref-hom, het and alt-hom). Chi-square test is performed. It is worth
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
29 noting that the model used here is prior dependent and assumes HWE,
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
30 which is different from both models for allele frequency estimate. The
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
31 new model actually yields a third estimate of site allele frequency.
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
32
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
33 The estimate allele frequency spectrum is printed to stderr per 64k
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
34 sites. The estimate is in fact only the first round of a EM
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
35 procedure. The second model (not the model for HWE testing) is used to
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
36 estimate the AFS.