Mercurial > repos > miller-lab > genome_diversity
comparison evaluate_population_numbers.xml @ 17:a3af29edcce2
Uploaded Miller Lab Devshed version a51c894f5bed
author | miller-lab |
---|---|
date | Fri, 28 Sep 2012 11:57:18 -0400 |
parents | 8ae67e9fb6ff |
children |
comparison
equal
deleted
inserted
replaced
16:be0e2223c531 | 17:a3af29edcce2 |
---|---|
1 <tool id="gd_evaluate_population_numbers" name="Population Complexity" version="1.0.0"> | |
2 <description>: Evaluate possible numbers of ancestral populations</description> | |
3 | |
4 <command interpreter="bash"> | |
5 evaluate_population_numbers.bash "${input.extra_files_path}/admix.ped" "$output" "$max_populations" | |
6 </command> | |
7 | |
8 <inputs> | |
9 <param name="input" type="data" format="gd_ped" label="Dataset" /> | |
10 <param name="max_populations" type="integer" min="1" value="5" label="Maximum number of populations" /> | |
11 </inputs> | |
12 | |
13 <outputs> | |
14 <data name="output" format="txt" /> | |
15 </outputs> | |
16 | |
17 <!-- | |
18 <tests> | |
19 <test> | |
20 <param name="input" value="fake" ftype="gd_ped" > | |
21 <metadata name="base_name" value="admix" /> | |
22 <composite_data value="test_out/prepare_population_structure/prepare_population_structure.html" /> | |
23 <composite_data value="test_out/prepare_population_structure/admix.ped" /> | |
24 <composite_data value="test_out/prepare_population_structure/admix.map" /> | |
25 <edit_attributes type="name" value="fake" /> | |
26 </param> | |
27 <param name="max_populations" value="2" /> | |
28 | |
29 <output name="output" file="test_out/evaluate_population_numbers/evaluate_population_numbers.txt" /> | |
30 </test> | |
31 </tests> | |
32 --> | |
33 | |
34 <help> | |
35 | |
36 **Dataset formats** | |
37 | |
38 The input dataset is in gd_ped_ format. | |
39 The output dataset is text. (`Dataset missing?`_) | |
40 | |
41 .. _gd_ped: ./static/formatHelp.html#gd_ped | |
42 .. _Dataset missing?: ./static/formatHelp.html | |
43 | |
44 ----- | |
45 | |
46 **What it does** | |
47 | |
48 The user selects a gd_ped dataset generated by the Prepare Input tool. | |
49 For all possible numbers K of ancestral | |
50 populations, from 1 up to a user-specified maximum, this tool produces values | |
51 that indicate how well the data can be explained as genotypes from individuals | |
52 derived from K ancestral populations. These values are computed by a 5-fold | |
53 cross-validation procedure, so that a good choice for K will exhibit a low | |
54 cross-validation error (CVE) compared with other potential settings for K. | |
55 | |
56 ----- | |
57 | |
58 **Acknowledgments** | |
59 | |
60 We use the program "Admixture", downloaded from | |
61 | |
62 http://www.genetics.ucla.edu/software/admixture/ | |
63 | |
64 and described in the paper "Fast model-based estimation of ancestry in | |
65 unrelated individuals" by David H. Alexander, John Novembre and Kenneth Lange, | |
66 Genome Research 19 (2009), pp. 1655-1664. Admixture is called with the "--cv" | |
67 flag to produce these values. | |
68 | |
69 ----- | |
70 | |
71 **Example** | |
72 | |
73 - output with max populations of 6:: | |
74 | |
75 CVE (K=1): 1.10120 | |
76 CVE (K=2): 1.34683 | |
77 CVE (K=3): 1.80611 | |
78 CVE (K=4): 1.96339 | |
79 CVE (K=5): 1.21522 | |
80 CVE (K=6): 0.51501 | |
81 | |
82 </help> | |
83 </tool> |