comparison aggregate_gd_indivs.xml @ 26:91e835060ad2

Updates to Admixture, Aggregate Individuals, and Restore Attributes to support gd_genotype
author Richard Burhans <burhans@bx.psu.edu>
date Mon, 03 Jun 2013 12:29:29 -0400
parents 95a05c1ef5d5
children 8997f2ca8c7a
comparison
equal deleted inserted replaced
25:cba0d7a63b82 26:91e835060ad2
1 <tool id="gd_sum_gd_snp" name="Aggregate Individuals" version="1.0.0"> 1 <tool id="gd_sum_gd_snp" name="Aggregate Individuals" version="1.1.0">
2 <description>: Append summary columns for a population</description> 2 <description>: Append summary columns for a population</description>
3 3
4 <command interpreter="python"> 4 <command interpreter="python">
5 aggregate_gd_indivs.py "$input" "$p1_input" "$output" 5 aggregate_gd_indivs.py "$input" "$p1_input" "$output"
6 #if $input_type.choice == '0'
7 "gd_snp"
8 #else if $input_type.choice == '1'
9 "gd_genotype"
10 #end if
6 #for $individual, $individual_col in zip($input.dataset.metadata.individual_names, $input.dataset.metadata.individual_columns) 11 #for $individual, $individual_col in zip($input.dataset.metadata.individual_names, $input.dataset.metadata.individual_columns)
7 #set $arg = '%s:%s' % ($individual_col, $individual) 12 #set $arg = '%s:%s' % ($individual_col, $individual)
8 "$arg" 13 "$arg"
9 #end for 14 #end for
10 </command> 15 </command>
11 16
12 <inputs> 17 <inputs>
13 <param name="input" type="data" format="gd_snp" label="SNP dataset" /> 18
19 <conditional name="input_type">
20 <param name="choice" type="select" format="integer" label="Input format">
21 <option value="0" selected="true">gd_snp</option>
22 <option value="1">gd_genotype</option>
23 </param>
24
25 <when value="0">
26 <param name="input" type="data" format="gd_snp" label="SNP dataset" />
27 </when>
28 <when value="1">
29 <param name="input" type="data" format="gd_genotype" label="Genotype dataset" />
30 </when>
31 </conditional>
32
14 <param name="p1_input" type="data" format="gd_indivs" label="Population individuals" /> 33 <param name="p1_input" type="data" format="gd_indivs" label="Population individuals" />
15 </inputs> 34 </inputs>
16 35
17 <outputs> 36 <outputs>
18 <data name="output" format="gd_snp" metadata_source="input" /> 37 <data name="output" format="input" format_source="input" metadata_source="input" />
19 </outputs> 38 </outputs>
20 39
21 <tests> 40 <tests>
22 <test> 41 <test>
23 <param name="input" value="test_in/sample.gd_snp" ftype="gd_snp" /> 42 <param name="input" value="test_in/sample.gd_snp" ftype="gd_snp" />
28 47
29 <help> 48 <help>
30 49
31 **Dataset formats** 50 **Dataset formats**
32 51
33 The input datasets are in gd_snp_ and gd_indivs_ formats. 52 The input datasets are in gd_snp_, gd_genotype_, and gd_indivs_ formats.
34 The output dataset is in gd_snp_ format. (`Dataset missing?`_) 53 The output dataset is in gd_snp_ or gd_genotype_ format. (`Dataset missing?`_)
35 54
36 .. _gd_snp: ./static/formatHelp.html#gd_snp 55 .. _gd_snp: ./static/formatHelp.html#gd_snp
56 .. _gd_genotype: ./static/formatHelp.html#gd_genotype
37 .. _gd_indivs: ./static/formatHelp.html#gd_indivs 57 .. _gd_indivs: ./static/formatHelp.html#gd_indivs
38 .. _Dataset missing?: ./static/formatHelp.html 58 .. _Dataset missing?: ./static/formatHelp.html
39 59
40 ----- 60 -----
41 61
42 **What it does** 62 **What it does**
43 63
44 The user specifies that some of the individuals in a gd_snp dataset form a 64 The user specifies that some of the individuals in a gd_snp or gd_genotype
45 "population", by supplying a list that has been previously created using the 65 dataset form a "population", by supplying a list that has been previously
46 Specify Individuals tool. The program appends a 66 created using the Specify Individuals tool. The program appends a new
47 new "entity" (set of four columns) to the gd_snp table, analogous to the columns 67 "entity" (set of four columns for a gd_snp table, or one column for a
48 for an individual but containing summary data for the population as a group. 68 gd_genotype table), analogous to the column(s) for an individual but
49 These four columns give the total counts for the two alleles, the "genotype" for 69 containing summary data for the population as a group. For a gd_snp
50 the population, and the maximum quality value, taken over all individuals in the 70 table, these four columns give the total counts for the two alleles,
51 population. If all defined genotypes in the population are 2 (agree with the 71 the "genotype" for the population, and the maximum quality value, taken
52 reference), then the population's genotype is 2, and similarly for 0; otherwise 72 over all individuals in the population. If all defined genotypes in
53 the genotype is 1 (unless all individuals have undefined genotype, in which case 73 the population are 2 (agree with the reference), then the population's
54 it is -1). 74 genotype is 2, and similarly for 0; otherwise the genotype is 1 (unless
75 all individuals have undefined genotype, in which case it is -1).
76 For a gd_genotype file, only the aggregate genotype is appended.
55 77
56 ----- 78 -----
57 79
58 **Example** 80 **Example**
59 81