comparison average_fst.xml @ 21:d6b961721037

Miller Lab Devshed version 4c04e35b18f6
author Richard Burhans <burhans@bx.psu.edu>
date Mon, 05 Nov 2012 12:44:17 -0500
parents f04f40a36cc8
children 95a05c1ef5d5
comparison
equal deleted inserted replaced
20:8a4b8efbc82c 21:d6b961721037
13 "$arg" 13 "$arg"
14 #end for 14 #end for
15 </command> 15 </command>
16 16
17 <inputs> 17 <inputs>
18 <param name="input" type="data" format="gd_snp" label="SNP table" /> 18 <param name="input" type="data" format="gd_snp" label="SNP dataset" />
19 <param name="p1_input" type="data" format="gd_indivs" label="Population 1 individuals" /> 19 <param name="p1_input" type="data" format="gd_indivs" label="Population 1 individuals" />
20 <param name="p2_input" type="data" format="gd_indivs" label="Population 2 individuals" /> 20 <param name="p2_input" type="data" format="gd_indivs" label="Population 2 individuals" />
21 21
22 <conditional name="data_source"> 22 <conditional name="data_source">
23 <param name="ds_choice" type="select" format="integer" label="Data source"> 23 <param name="ds_choice" type="select" format="integer" label="Frequency metric">
24 <option value="0" selected="true">sequence coverage and ..</option> 24 <option value="0" selected="true">sequence coverage</option>
25 <option value="1">estimated genotype and ..</option> 25 <option value="1">estimated genotype</option>
26 </param> 26 </param>
27 <when value="0"> 27 <when value="0">
28 <param name="min_value" type="integer" min="1" value="1" label="Minimum total read count for a population" /> 28 <param name="min_value" type="integer" min="1" value="1" label="Minimum total read count for a population" />
29 </when> 29 </when>
30 <when value="1"> 30 <when value="1">
31 <param name="min_value" type="integer" min="1" value="1" label="Minimum individual genotype quality" /> 31 <param name="min_value" type="integer" min="1" value="1" label="Minimum individual genotype quality" />
32 </when> 32 </when>
33 </conditional> 33 </conditional>
34 34
35 <param name="discard_fixed" type="select" label="Apparently fixed SNPs"> 35 <param name="discard_fixed" type="select" label="For SNPs that appear to be fixed across both populations">
36 <option value="0">Retain SNPs that appear fixed in the two populations</option> 36 <option value="0">retain</option>
37 <option value="1" selected="true">Delete SNPs that appear fixed in the two populations</option> 37 <option value="1" selected="true">delete</option>
38 </param> 38 </param>
39 39
40 <conditional name="use_randomization"> 40 <conditional name="use_randomization">
41 <param name="ur_choice" type="select" format="integer" label="Use randomization"> 41 <param name="ur_choice" type="select" format="integer" label="Use randomization">
42 <option value="0" selected="true">No</option> 42 <option value="0" selected="true">no</option>
43 <option value="1">Yes</option> 43 <option value="1">yes</option>
44 </param> 44 </param>
45 <when value="0" /> 45 <when value="0" />
46 <when value="1"> 46 <when value="1">
47 <param name="shuffles" type="integer" min="0" value="0" label="Shuffles" /> 47 <param name="shuffles" type="integer" min="0" value="0" label="Shuffles" />
48 <param name="p0_input" type="data" format="gd_indivs" label="Individuals for randomization" /> 48 <param name="p0_input" type="data" format="gd_indivs" label="Individuals for randomization" />
67 </test> 67 </test>
68 </tests> 68 </tests>
69 69
70 <help> 70 <help>
71 71
72 **Dataset formats**
73
74 The input datasets are in gd_snp_ and gd_indivs_ formats.
75 The output dataset is in text_ format. (`Dataset missing?`_)
76
77 .. _gd_snp: ./static/formatHelp.html#gd_snp
78 .. _gd_indivs: ./static/formatHelp.html#gd_indivs
79 .. _text: ./static/formatHelp.html#text
80 .. _Dataset missing?: ./static/formatHelp.html
81
82 -----
83
72 **What it does** 84 **What it does**
73 85
74 The user specifies a SNP table and two "populations" of individuals, both previously defined using the Galaxy tool to specify individuals from a SNP table. No individual can be in both populations. Other choices are as follows. 86 The user specifies a SNP table and two "populations" of individuals, both previously defined using the Galaxy tool to specify individuals from a SNP table. No individual can be in both populations. Other choices are as follows.
75 87
76 Data source. The allele frequencies of a SNP in the two populations can be estimated either by the total number of reads of each allele, or by adding the frequencies inferred from genotypes of individuals in the populations. 88 Frequency metric. The allele frequencies of a SNP in the two populations can be estimated either by the total number of reads of each allele, or by adding the frequencies inferred from genotypes of individuals in the populations.
77 89
78 After specifying the data source, the user sets lower bounds on amount of data required at a SNP. For estimating the FST using read counts, the bound is the minimum count of reads of the two alleles in a population. For estimations based on genotype, the bound is the minimum reported genotype quality per individual. SMPs not meeting these lower bounds are ignored. 90 After specifying the frequency metric, the user sets lower bounds on amount of data required at a SNP. For estimating the FST using read counts, the bound is the minimum count of reads of the two alleles in a population. For estimations based on genotype, the bound is the minimum reported genotype quality per individual. SNPs not meeting these lower bounds are ignored.
79 91
80 The user specifies whether SNPs where both populations appear to be fixed for the same allele should be retained or discarded. 92 The user specifies whether SNPs where both populations appear to be fixed for the same allele should be retained or discarded.
81 93
82 Finally, the user decides whether to use randomizations. If so, then the user specifies how many randomly generated population pairs (retaining the numbers of individuals of the originals) to generate, as well as the "population" of additional individuals (not in the first two populations) that can be used in the randomization process. 94 Finally, the user decides whether to use randomizations. If so, then the user specifies how many randomly generated population pairs (retaining the numbers of individuals of the originals) to generate, as well as the "population" of additional individuals (not in the first two populations) that can be used in the randomization process.
83 95
84 The program prints the following measures of FST for the two populations. 96 The program prints the following measures of FST for the two populations.
97
85 1. The formulation by Sewall Wright (average over FSTs for all SNPs). 98 1. The formulation by Sewall Wright (average over FSTs for all SNPs).
86 2. The Weir-Cockerham estimator (average over FSTs for all SNPs). 99 2. The Weir-Cockerham estimator (average over FSTs for all SNPs).
87 3. The Reich-Patterson estimator (average over FSTs for all SNPs). 100 3. The Reich-Patterson estimator (average over FSTs for all SNPs).
88 4. The population-based Reich-Patterson estimator. 101 4. The population-based Reich-Patterson estimator.
89 102
91 104
92 References: 105 References:
93 106
94 Sewall Wright (1951) The genetical structure of populations. Ann Eugen 15:323-354. 107 Sewall Wright (1951) The genetical structure of populations. Ann Eugen 15:323-354.
95 108
96 B. S. Weir and C. Clark Cockerham (1984) Estimating F-statistics for the analysis of population structure. Evolution 38:1358-1370. 109 Weir, B.S. and Cockerham, C. Clark (1984) Estimating F-statistics for the analysis of population structure. Evolution 38:1358-1370.
97 110
98 Weir, B.S. 1996. Population substructure. Genetic data analysis II, pp. 161-173. Sinauer Associates, Sundand, MA. 111 Weir, B.S. 1996. Population substructure. Genetic data analysis II, pp. 161-173. Sinauer Associates, Sundand, MA.
99 112
100 David Reich, Kumarasamy Thangaraj, Nick Patterson, Alkes L. Price, and Lalji Singh (2009) Reconstructing Indian population history. Nature 461:489-494, especially Supplement 2. 113 David Reich, Kumarasamy Thangaraj, Nick Patterson, Alkes L. Price, and Lalji Singh (2009) Reconstructing Indian population history. Nature 461:489-494, especially Supplement 2.
101 114
102 Their effectiveness for computing FSTs when there are many SNPs but few individuals is discussed in the followoing paper. 115 Their effectiveness for computing FSTs when there are many SNPs but few individuals is discussed in the following paper.
103 116
104 Eva-Maria Willing, Christine Dreyer, Cock van Oosterhout (2012) Estimates of genetic differentiation measured by FST do not necessarily require large sample sizes when using many SNP markers. PLoS One 7:e42649. 117 Eva-Maria Willing, Christine Dreyer, Cock van Oosterhout (2012) Estimates of genetic differentiation measured by FST do not necessarily require large sample sizes when using many SNP markers. PLoS One 7:e42649.
118
119 -----
120
121 **Example**
122
123 - output::
124
125 Using 37847 SNPs, we compute:
126 Average Wright FST is 0.22810.
127 Average Weir-Cockerham FST is 0.30813.
128 Average Reich-Patterson FST is 0.31012.
129 The population-based Reich-Patterson Fst is 0.33625.
130
105 </help> 131 </help>
106 </tool> 132 </tool>