genome_diversity: evaluate_population

comparison evaluate_population_numbers.xml @ 17:a3af29edcce2

Uploaded Miller Lab Devshed version a51c894f5bed

author	miller-lab
date	Fri, 28 Sep 2012 11:57:18 -0400
parents	8ae67e9fb6ff
children

comparison

equal deleted inserted replaced

-:be0e2223c531
+:a3af29edcce2
+<tool id="gd_evaluate_population_numbers" name="Population Complexity" version="1.0.0">
+<description>: Evaluate possible numbers of ancestral populations</description>
+<command interpreter="bash">
+evaluate_population_numbers.bash "${input.extra_files_path}/admix.ped" "$output" "$max_populations"
+</command>
+<inputs>
+<param name="input" type="data" format="gd_ped" label="Dataset" />
+<param name="max_populations" type="integer" min="1" value="5" label="Maximum number of populations" />
+</inputs>
+<outputs>
+<data name="output" format="txt" />
+</outputs>
+<!--
+<tests>
+<test>
+<param name="input" value="fake" ftype="gd_ped" >
+<metadata name="base_name" value="admix" />
+<composite_data value="test_out/prepare_population_structure/prepare_population_structure.html" />
+<composite_data value="test_out/prepare_population_structure/admix.ped" />
+<composite_data value="test_out/prepare_population_structure/admix.map" />
+<edit_attributes type="name" value="fake" />
+</param>
+<param name="max_populations" value="2" />
+<output name="output" file="test_out/evaluate_population_numbers/evaluate_population_numbers.txt" />
+</test>
+</tests>
+-->
+<help>
+**Dataset formats**
+The input dataset is in gd_ped_ format.
+The output dataset is text.  (`Dataset missing?`_)
+.. _gd_ped: ./static/formatHelp.html#gd_ped
+.. _Dataset missing?: ./static/formatHelp.html
+-----
+**What it does**
+The user selects a gd_ped dataset generated by the Prepare Input tool.
+For all possible numbers K of ancestral
+populations, from 1 up to a user-specified maximum, this tool produces values
+that indicate how well the data can be explained as genotypes from individuals
+derived from K ancestral populations.  These values are computed by a 5-fold
+cross-validation procedure, so that a good choice for K will exhibit a low
+cross-validation error (CVE) compared with other potential settings for K.
+-----
+**Acknowledgments**
+We use the program "Admixture", downloaded from
+http://www.genetics.ucla.edu/software/admixture/
+and described in the paper "Fast model-based estimation of ancestry in
+unrelated individuals" by David H. Alexander, John Novembre and Kenneth Lange,
+Genome Research 19 (2009), pp. 1655-1664. Admixture is called with the "--cv"
+flag to produce these values.
+-----
+**Example**
+- output with max populations of 6::
+CVE (K=1): 1.10120
+CVE (K=2): 1.34683
+CVE (K=3): 1.80611
+CVE (K=4): 1.96339
+CVE (K=5): 1.21522
+CVE (K=6): 0.51501
+</help>
+</tool>

Mercurial > repos > miller-lab > genome_diversity

comparison evaluate_population_numbers.xml @ 17:a3af29edcce2