Mercurial > repos > miller-lab > genome_diversity
view prepare_population_structure.xml @ 18:f04f40a36cc8
Latest changes from Belinda and Cathy. Webb's updates to the Fst tools.
author | Richard Burhans <burhans@bx.psu.edu> |
---|---|
date | Tue, 23 Oct 2012 12:41:52 -0400 |
parents | 8ae67e9fb6ff |
children | 248b06e86022 |
line wrap: on
line source
<tool id="gd_prepare_population_structure" name="Prepare Input" version="1.0.0"> <description>: Filter and convert to the format needed for these tools</description> <command interpreter="python"> prepare_population_structure.py "$input" "$min_reads" "$min_qual" "$min_spacing" "$output" "$output.files_path" #if $individuals.choice == '0' "all_individuals" #else if $individuals.choice == '1' #for $population in $individuals.populations #set $pop_arg = 'population:%s:%s' % (str($population.p_input), str($population.p_input.name)) "$pop_arg" #end for #end if #for $individual, $individual_col in zip($input.dataset.metadata.individual_names, $input.dataset.metadata.individual_columns) #set $arg = 'individual:%s:%s' % ($individual_col, $individual) "$arg" #end for </command> <inputs> <param name="input" type="data" format="gd_snp" label="SNP dataset" /> <conditional name="individuals"> <param name="choice" type="select" label="Individuals"> <option value="0" selected="true">All individuals</option> <option value="1">Specified populations</option> </param> <when value="0" /> <when value="1"> <repeat name="populations" title="Population" min="1"> <param name="p_input" type="data" format="gd_indivs" label="Individuals" /> </repeat> </when> </conditional> <param name="min_reads" type="integer" min="0" value="0" label="Minimum SNP coverage" /> <param name="min_qual" type="integer" min="0" value="0" label="Minimum SNP quality" /> <param name="min_spacing" type="integer" min="0" value="0" label="Minimum spacing between SNPs" /> </inputs> <outputs> <data name="output" format="gd_ped"> <actions> <action type="metadata" name="base_name" default="admix" /> </actions> </data> </outputs> <tests> <test> <param name="input" value="test_in/sample.gd_snp" ftype="gd_snp" /> <param name="min_reads" value="3" /> <param name="min_qual" value="30" /> <param name="min_spacing" value="0" /> <param name="choice" value="0" /> <output name="output" file="test_out/prepare_population_structure/prepare_population_structure.html" ftype="html" compare="diff" lines_diff="2"> <extra_files type="file" name="admix.map" value="test_out/prepare_population_structure/admix.map" /> <extra_files type="file" name="admix.ped" value="test_out/prepare_population_structure/admix.ped" /> </output> </test> </tests> <help> **Dataset formats** The input datasets are in gd_snp_ and gd_indivs_ formats. The output dataset is in gd_ped_ format. (`Dataset missing?`_) .. _gd_snp: ./static/formatHelp.html#gd_snp .. _gd_indivs: ./static/formatHelp.html#gd_indivs .. _gd_ped: ./static/formatHelp.html#gd_ped .. _Dataset missing?: ./static/formatHelp.html ----- **What it does** This tool converts a gd_snp dataset into the format needed for estimating the population structure. You can select the individuals to be included, by using "population" datasets created via the Specify Individuals tool. (It is important for these population datasets to have distinguishable names, since they will be stored in the output's metadata so that subsequent tools can use them as labels. If necessary, rename the datasets to give them distinct and meaningful names before running this tool.) You can also filter the SNPs, based on criteria such as minimum coverage (a qualifying SNP must have at least this many reads for every included individual), minimum quality score (for every included individual), and/or minimum spacing (SNPs that are too close together on the same chromosome or scaffold are discarded). In addition to producing the filtered and formatted .map and .ped files for subsequent analysis, the tool reports the number of SNPs meeting these conditions, which can be seen by clicking on the eye icon in the history panel after the program runs. ----- **Example** - input:: Contig161_chr1_4641264_4641879 115 C T 73.5 chr1 4641382 C 6 0 2 45 8 0 2 51 15 0 2 72 5 0 2 42 6 0 2 45 10 0 2 57 Y 54 0.323 0 Contig48_chr1_10150253_10151311 11 A G 94.3 chr1 10150264 A 1 0 2 30 1 0 2 30 1 0 2 30 3 0 2 36 1 0 2 30 1 0 2 30 Y 22 +99. 0 Contig20_chr1_21313469_21313570 66 C T 54.0 chr1 21313534 C 4 0 2 39 4 0 2 39 5 0 2 42 4 0 2 39 4 0 2 39 5 0 2 42 N 1 +99. 0 etc. - output cover page:: Prepare to look for population structure Galaxy Composite Dataset Output completed: 2012-10-01 04:09:36 PM Outputs * admix.ped (link) * admix.map (link) * Using 222 of 400 SNPs Inputs * Minimum reads covering a SNP, per individual: 6 * Minimum quality value, per individual: 0 * Minimum spacing between SNPs on the same scaffold: 0 Populations * Pop. A 1. PB1 2. PB2 * Pop. B 1. PB3 2. PB4 * Pop. C 1. PB6 2. PB8 </help> </tool>