comparison add_fst_column.xml @ 21:d6b961721037

Miller Lab Devshed version 4c04e35b18f6
author Richard Burhans <burhans@bx.psu.edu>
date Mon, 05 Nov 2012 12:44:17 -0500
parents f04f40a36cc8
children 95a05c1ef5d5
comparison
equal deleted inserted replaced
20:8a4b8efbc82c 21:d6b961721037
8 "$arg" 8 "$arg"
9 #end for 9 #end for
10 </command> 10 </command>
11 11
12 <inputs> 12 <inputs>
13 <param name="input" type="data" format="gd_snp" label="SNP table" /> 13 <param name="input" type="data" format="gd_snp" label="SNP dataset" />
14 <param name="p1_input" type="data" format="gd_indivs" label="Population 1 individuals" /> 14 <param name="p1_input" type="data" format="gd_indivs" label="Population 1 individuals" />
15 <param name="p2_input" type="data" format="gd_indivs" label="Population 2 individuals" /> 15 <param name="p2_input" type="data" format="gd_indivs" label="Population 2 individuals" />
16 16
17 <param name="data_source" type="select" format="integer" label="Data source"> 17 <param name="data_source" type="select" format="integer" label="Frequency metric">
18 <option value="0" selected="true">sequence coverage</option> 18 <option value="0" selected="true">sequence coverage</option>
19 <option value="1">estimated genotype</option> 19 <option value="1">estimated genotype</option>
20 </param> 20 </param>
21 21
22 <param name="min_reads" type="integer" min="0" value="0" label="Minimum total read count for a population" /> 22 <param name="min_reads" type="integer" min="0" value="0" label="Minimum total read count for a population" />
23 <param name="min_qual" type="integer" min="0" value="0" label="Minimum individual genotype quality" /> 23 <param name="min_qual" type="integer" min="0" value="0" label="Minimum individual genotype quality" />
24 24
25 <param name="retain" type="select" label="Special treatment"> 25 <param name="retain" type="select" label="If a SNP is below minimum">
26 <option value="0" selected="true">Skip row</option> 26 <option value="0" selected="true">skip SNP</option>
27 <option value="1">Set FST = -1</option> 27 <option value="1">set FST = -1</option>
28 </param> 28 </param>
29 29
30 <param name="discard_fixed" type="select" label="Apparently fixed SNPs"> 30 <param name="discard_fixed" type="select" label="For SNPs that appear to be fixed across both populations">
31 <option value="0">Retain SNPs that appear fixed in the two populations</option> 31 <option value="0">retain</option>
32 <option value="1" selected="true">Delete SNPs that appear fixed in the two populations</option> 32 <option value="1" selected="true">delete</option>
33 </param> 33 </param>
34 34
35 <param name="biased" type="select" label="FST estimator"> 35 <param name="biased" type="select" label="FST estimator">
36 <option value="0" selected="true">Wright's original definition</option> 36 <option value="0" selected="true">Wright's original definition</option>
37 <option value="1">The Weir-Cockerham estimator</option> 37 <option value="1">the Weir-Cockerham estimator</option>
38 <option value="2">The Reich-Patterson estimator</option> 38 <option value="2">the Reich-Patterson estimator</option>
39 </param> 39 </param>
40 40
41 </inputs> 41 </inputs>
42 42
43 <outputs> 43 <outputs>
59 </test> 59 </test>
60 </tests> 60 </tests>
61 61
62 <help> 62 <help>
63 63
64 **Dataset formats**
65
66 The input datasets are in gd_snp_ and gd_indivs_ formats.
67 The output dataset is in gd_snp_ format. (`Dataset missing?`_)
68
69 .. _gd_snp: ./static/formatHelp.html#gd_snp
70 .. _gd_indivs: ./static/formatHelp.html#gd_indivs
71 .. _Dataset missing?: ./static/formatHelp.html
72
73 -----
74
64 **What it does** 75 **What it does**
65 76
66 The user specifies a SNP table and two "populations" of individuals, both previously defined using the Galaxy tool to specify individuals from a SNP table. No individual can be in both populations. Other choices are as follows. 77 The user specifies a SNP table and two "populations" of individuals, both previously defined using the Galaxy tool to specify individuals from a SNP table. No individual can be in both populations. Other choices are as follows.
67 78
68 Data source. The allele frequencies of a SNP in the two populations can be estimated either by the total number of reads of each allele, or by adding the frequencies inferred from genotypes of individuals in the populations. 79 Frequency metric. The allele frequencies of a SNP in the two populations can be estimated either by the total number of reads of each allele, or by adding the frequencies inferred from genotypes of individuals in the populations.
69 80
70 After specifying the data source, the user sets lower bounds on amount of data required at a SNP. For estimating the Fst using read counts, the bound is the minimum count of reads of the two alleles in a population. For estimations based on genotype, the bound is the minimum reported genotype quality per individual. 81 After specifying the frequency metric, the user sets lower bounds on amount of data required at a SNP. For estimating the Fst using read counts, the bound is the minimum count of reads of the two alleles in a population. For estimations based on genotype, the bound is the minimum reported genotype quality per individual.
71 82
72 The user specifies whether the SNPs that violate the lower bound should be ignored or the Fst set to -1. 83 The user specifies whether the SNPs that violate the lower bound should be ignored or the Fst set to -1.
73 84
74 The user specifies whether SNPs where both populations appear to be fixed for the same allele should be retained or discarded. 85 The user specifies whether SNPs where both populations appear to be fixed for the same allele should be retained or discarded.
75 86
79 90
80 References: 91 References:
81 92
82 Sewall Wright (1951) The genetical structure of populations. Ann Eugen 15:323-354. 93 Sewall Wright (1951) The genetical structure of populations. Ann Eugen 15:323-354.
83 94
84 B. S. Weir and C. Clark Cockerham (1984) Estimating F-statistics for the analysis of population structure. Evolution 38:1358-1370. 95 Weir, B.S. and Cockerham, C. Clark (1984) Estimating F-statistics for the analysis of population structure. Evolution 38:1358-1370.
85 96
86 Weir, B.S. 1996. Population substructure. Genetic data analysis II, pp. 161-173. Sinauer Associates, Sundand, MA. 97 Weir, B.S. 1996. Population substructure. Genetic data analysis II, pp. 161-173. Sinauer Associates, Sundand, MA.
87 98
88 David Reich, Kumarasamy Thangaraj, Nick Patterson, Alkes L. Price, and Lalji Singh (2009) Reconstructing Indian population history. Nature 461:489-494, especially Supplement 2. 99 David Reich, Kumarasamy Thangaraj, Nick Patterson, Alkes L. Price, and Lalji Singh (2009) Reconstructing Indian population history. Nature 461:489-494, especially Supplement 2.
89 100
90 Their effectiveness for computing FSTs when there are many SNPs but few individuals is discussed in the followoing paper. 101 Their effectiveness for computing FSTs when there are many SNPs but few individuals is discussed in the following paper.
91 102
92 Eva-Maria Willing, Christine Dreyer, Cock van Oosterhout (2012) Estimates of genetic differentiation measured by FST do not necessarily require large sample sizes when using many SNP markers. PLoS One 7:e42649. 103 Eva-Maria Willing, Christine Dreyer, Cock van Oosterhout (2012) Estimates of genetic differentiation measured by FST do not necessarily require large sample sizes when using many SNP markers. PLoS One 7:e42649.
93 104
105 -----
106
107 **Example**
108
109 - input, SNP table::
110
111 #{"column_names":["scaf","pos","A","B","qual","ref","rpos","rnuc","1A","1B","1G","1Q","2A","2B","2G","2Q","3A","3B","3G","3Q","4A","4B","4G","4Q",
112 #"5A","5B","5G","5Q","6A","6B","6G","6Q","pair","dist","prim","rflp"],"dbkey":"canFam2",
113 #"individuals":[["PB1",9],["PB2",13],["PB3",17],["PB4",21],["PB6",25],["PB8",29]],
114 #"pos":2,"rPos":7,"ref":6,"scaffold":1,"species":"bear"}
115 Contig161_chr1_4641264_4641879 115 C T 73.5 chr1 4641382 C 6 0 2 45 8 0 2 51 15 0 2 72 5 0 2 42 6 0 2 45 10 0 2 57 Y 54 0.323 0
116 Contig113_chr5_11052263_11052603 28 C T 38.2 chr5 11052280 C 1 2 1 12 3 2 1 10 5 0 2 42 2 1 2 13 3 0 2 36 8 0 2 51 Y 161 +99. 0
117 Contig215_chr5_70946445_70947428 363 T G 28.2 chr5 70946809 C 4 0 2 39 0 5 0 12 9 0 2 54 6 0 2 45 3 3 2 1 9 0 2 54 N 43 0.153 0
118 etc.
119
120 - input, Population 1 individuals::
121
122 9 PB1
123 13 PB2
124
125 - input, Population 2 individuals::
126
127 17 PB3
128 21 PB4
129
130 - output (minimum read count of 3, discard fixed)::
131
132 Contig113_chr5_11052263_11052603 28 C T 38.2 chr5 11052280 C 1 2 1 12 3 2 1 10 5 0 2 42 2 1 2 13 3 0 2 36 8 0 2 51 Y 161 +99. 0 0.1636
133 Contig215_chr5_70946445_70947428 363 T G 28.2 chr5 70946809 C 4 0 2 39 0 5 0 12 9 0 2 54 6 0 2 45 3 3 2 1 9 0 2 54 N 43 0.153 0 0.3846
134 etc.
135
94 </help> 136 </help>
95 </tool> 137 </tool>