comparison average_fst.xml @ 18:f04f40a36cc8

Latest changes from Belinda and Cathy. Webb's updates to the Fst tools.
author Richard Burhans <burhans@bx.psu.edu>
date Tue, 23 Oct 2012 12:41:52 -0400
parents 8ae67e9fb6ff
children d6b961721037
comparison
equal deleted inserted replaced
17:a3af29edcce2 18:f04f40a36cc8
1 <tool id="gd_average_fst" name="Overall FST" version="1.0.0"> 1 <tool id="gd_average_fst" name="Overall FST" version="1.1.0">
2 <description>: Estimate the relative fixation index between two populations</description> 2 <description>: Estimate the relative fixation index between two populations</description>
3 3
4 <command interpreter="python"> 4 <command interpreter="python">
5 average_fst.py "$input" "$p1_input" "$p2_input" "$data_source.ds_choice" "$data_source.min_value" "$discard_fixed" "$biased" "$output" 5 average_fst.py "$input" "$p1_input" "$p2_input" "$data_source.ds_choice" "$data_source.min_value" "$discard_fixed" "$output"
6 #if $use_randomization.ur_choice == '1' 6 #if $use_randomization.ur_choice == '1'
7 "$use_randomization.shuffles" "$use_randomization.p0_input" 7 "$use_randomization.shuffles" "$use_randomization.p0_input"
8 #else 8 #else
9 "0" "/dev/null" 9 "0" "/dev/null"
10 #end if 10 #end if
35 <param name="discard_fixed" type="select" label="Apparently fixed SNPs"> 35 <param name="discard_fixed" type="select" label="Apparently fixed SNPs">
36 <option value="0">Retain SNPs that appear fixed in the two populations</option> 36 <option value="0">Retain SNPs that appear fixed in the two populations</option>
37 <option value="1" selected="true">Delete SNPs that appear fixed in the two populations</option> 37 <option value="1" selected="true">Delete SNPs that appear fixed in the two populations</option>
38 </param> 38 </param>
39 39
40 <param name="biased" type="select" label="FST estimator">
41 <option value="0" selected="true">Wright's original definition</option>
42 <option value="1">Weir's unbiased estimator</option>
43 </param>
44
45 <conditional name="use_randomization"> 40 <conditional name="use_randomization">
46 <param name="ur_choice" type="select" format="integer" label="Use randomization"> 41 <param name="ur_choice" type="select" format="integer" label="Use randomization">
47 <option value="0" selected="true">No</option> 42 <option value="0" selected="true">No</option>
48 <option value="1">Yes</option> 43 <option value="1">Yes</option>
49 </param> 44 </param>
65 <param name="p1_input" value="test_in/a.gd_indivs" ftype="gd_indivs" /> 60 <param name="p1_input" value="test_in/a.gd_indivs" ftype="gd_indivs" />
66 <param name="p2_input" value="test_in/b.gd_indivs" ftype="gd_indivs" /> 61 <param name="p2_input" value="test_in/b.gd_indivs" ftype="gd_indivs" />
67 <param name="ds_choice" value="0" /> 62 <param name="ds_choice" value="0" />
68 <param name="min_value" value="3" /> 63 <param name="min_value" value="3" />
69 <param name="discard_fixed" value="1" /> 64 <param name="discard_fixed" value="1" />
70 <param name="biased" value="0" />
71 <param name="ur_choice" value="0" /> 65 <param name="ur_choice" value="0" />
72 <output name="output" file="test_out/average_fst/average_fst.txt" /> 66 <output name="output" file="test_out/average_fst/average_fst.txt" />
73 </test> 67 </test>
74 </tests> 68 </tests>
75 69
76 <help> 70 <help>
77 71
78 **What it does** 72 **What it does**
79 73
80 The user specifies a SNP table and two "populations" of individuals, 74 The user specifies a SNP table and two "populations" of individuals, both previously defined using the Galaxy tool to specify individuals from a SNP table. No individual can be in both populations. Other choices are as follows.
81 both previously defined using the Specify Individuals tool.
82 No individual can be in both populations. Other choices are as follows.
83 75
84 Data source. The allele frequencies of a SNP in the two populations can be 76 Data source. The allele frequencies of a SNP in the two populations can be estimated either by the total number of reads of each allele, or by adding the frequencies inferred from genotypes of individuals in the populations.
85 estimated either by the total number of reads of each allele, or by adding
86 the frequencies inferred from genotypes of individuals in the populations.
87 77
88 After specifying the data source, the user sets lower bounds on amount 78 After specifying the data source, the user sets lower bounds on amount of data required at a SNP. For estimating the FST using read counts, the bound is the minimum count of reads of the two alleles in a population. For estimations based on genotype, the bound is the minimum reported genotype quality per individual. SMPs not meeting these lower bounds are ignored.
89 of data required at a SNP. For estimating the Fst using read counts,
90 the bound is the minimum count of reads of the two alleles in a population.
91 For estimations based on genotype, the bound is the minimum reported genotype
92 quality per individual. SNPs not meeting these lower bounds are ignored.
93 79
94 The user specifies whether SNPs where both populations appear to be fixed 80 The user specifies whether SNPs where both populations appear to be fixed for the same allele should be retained or discarded.
95 for the same allele should be retained or discarded.
96 81
97 The user chooses which definition of Fst to use: Wright's original definition 82 Finally, the user decides whether to use randomizations. If so, then the user specifies how many randomly generated population pairs (retaining the numbers of individuals of the originals) to generate, as well as the "population" of additional individuals (not in the first two populations) that can be used in the randomization process.
98 or Weir's unbiased estimator.
99 83
100 Finally, the user decides whether to use randomizations. If so, then the 84 The program prints the following measures of FST for the two populations.
101 user specifies how many randomly generated population pairs (retaining 85 1. The formulation by Sewall Wright (average over FSTs for all SNPs).
102 the numbers of individuals of the originals) to generate, as well as the 86 2. The Weir-Cockerham estimator (average over FSTs for all SNPs).
103 "population" of additional individuals (not in the first two populations) 87 3. The Reich-Patterson estimator (average over FSTs for all SNPs).
104 that can be used in the randomization process. 88 4. The population-based Reich-Patterson estimator.
105 89
106 The program prints the average Fst for the original populations and the 90 If randomizations were requested, it prints a summary for each of the four definitions of FST that includes the maximum and average value, and the highest-scoring population pair (if any scored higher than the two user-specified populations).
107 number of SNPs used to compute it. If randomizations were requested,
108 it prints the average Fst for each randomly generated population pair,
109 ending with a summary that includes the maximum and average value, and the
110 highest-scoring population pair.
111 91
92 References:
93
94 Sewall Wright (1951) The genetical structure of populations. Ann Eugen 15:323-354.
95
96 B. S. Weir and C. Clark Cockerham (1984) Estimating F-statistics for the analysis of population structure. Evolution 38:1358-1370.
97
98 Weir, B.S. 1996. Population substructure. Genetic data analysis II, pp. 161-173. Sinauer Associates, Sundand, MA.
99
100 David Reich, Kumarasamy Thangaraj, Nick Patterson, Alkes L. Price, and Lalji Singh (2009) Reconstructing Indian population history. Nature 461:489-494, especially Supplement 2.
101
102 Their effectiveness for computing FSTs when there are many SNPs but few individuals is discussed in the followoing paper.
103
104 Eva-Maria Willing, Christine Dreyer, Cock van Oosterhout (2012) Estimates of genetic differentiation measured by FST do not necessarily require large sample sizes when using many SNP markers. PLoS One 7:e42649.
112 </help> 105 </help>
113 </tool> 106 </tool>