Mercurial > repos > lsong10 > psiclass
comparison PsiCLASS-1.0.2/samtools-0.1.19/bcftools/README @ 0:903fc43d6227 draft default tip
Uploaded
author | lsong10 |
---|---|
date | Fri, 26 Mar 2021 16:52:45 +0000 |
parents | |
children |
comparison
equal
deleted
inserted
replaced
-1:000000000000 | 0:903fc43d6227 |
---|---|
1 The view command of bcftools calls variants, tests Hardy-Weinberg | |
2 equilibrium (HWE), tests allele balances and estimates allele frequency. | |
3 | |
4 This command calls a site as a potential variant if P(ref|D,F) is below | |
5 0.9 (controlled by the -p option), where D is data and F is the prior | |
6 allele frequency spectrum (AFS). | |
7 | |
8 The view command performs two types of allele balance tests, both based | |
9 on Fisher's exact test for 2x2 contingency tables with the row variable | |
10 being reference allele or not. In the first table, the column variable | |
11 is strand. Two-tail P-value is taken. We test if variant bases tend to | |
12 come from one strand. In the second table, the column variable is | |
13 whether a base appears in the first or the last 11bp of the read. | |
14 One-tail P-value is taken. We test if variant bases tend to occur | |
15 towards the end of reads, which is usually an indication of | |
16 misalignment. | |
17 | |
18 Site allele frequency is estimated in two ways. In the first way, the | |
19 frequency is esimated as \argmax_f P(D|f) under the assumption of | |
20 HWE. Prior AFS is not used. In the second way, the frequency is | |
21 estimated as the posterior expectation of allele counts \sum_k | |
22 kP(k|D,F), dividied by the total number of haplotypes. HWE is not | |
23 assumed, but the estimate depends on the prior AFS. The two estimates | |
24 largely agree when the signal is strong, but may differ greatly on weak | |
25 sites as in this case, the prior plays an important role. | |
26 | |
27 To test HWE, we calculate the posterior distribution of genotypes | |
28 (ref-hom, het and alt-hom). Chi-square test is performed. It is worth | |
29 noting that the model used here is prior dependent and assumes HWE, | |
30 which is different from both models for allele frequency estimate. The | |
31 new model actually yields a third estimate of site allele frequency. | |
32 | |
33 The estimate allele frequency spectrum is printed to stderr per 64k | |
34 sites. The estimate is in fact only the first round of a EM | |
35 procedure. The second model (not the model for HWE testing) is used to | |
36 estimate the AFS. |