Mercurial > repos > miller-lab > genome_diversity
annotate aggregate_gd_indivs.xml @ 25:cba0d7a63b82
workaround for gd_genotype datatype
admix shift int -> float
author | Richard Burhans <burhans@bx.psu.edu> |
---|---|
date | Wed, 29 May 2013 13:49:19 -0400 |
parents | 95a05c1ef5d5 |
children | 91e835060ad2 |
rev | line source |
---|---|
13 | 1 <tool id="gd_sum_gd_snp" name="Aggregate Individuals" version="1.0.0"> |
2 <description>: Append summary columns for a population</description> | |
3 | |
4 <command interpreter="python"> | |
22
95a05c1ef5d5
update to devshed revision aaece207bd01
Richard Burhans <burhans@bx.psu.edu>
parents:
18
diff
changeset
|
5 aggregate_gd_indivs.py "$input" "$p1_input" "$output" |
13 | 6 #for $individual, $individual_col in zip($input.dataset.metadata.individual_names, $input.dataset.metadata.individual_columns) |
7 #set $arg = '%s:%s' % ($individual_col, $individual) | |
8 "$arg" | |
9 #end for | |
10 </command> | |
11 | |
12 <inputs> | |
13 <param name="input" type="data" format="gd_snp" label="SNP dataset" /> | |
14 <param name="p1_input" type="data" format="gd_indivs" label="Population individuals" /> | |
15 </inputs> | |
16 | |
17 <outputs> | |
18 <data name="output" format="gd_snp" metadata_source="input" /> | |
19 </outputs> | |
20 | |
21 <tests> | |
22 <test> | |
23 <param name="input" value="test_in/sample.gd_snp" ftype="gd_snp" /> | |
24 <param name="p1_input" value="test_in/a.gd_indivs" ftype="gd_indivs" /> | |
25 <output name="output" file="test_out/modify_snp_table/modify.gd_snp" /> | |
26 </test> | |
27 </tests> | |
28 | |
29 <help> | |
30 | |
31 **Dataset formats** | |
32 | |
33 The input datasets are in gd_snp_ and gd_indivs_ formats. | |
34 The output dataset is in gd_snp_ format. (`Dataset missing?`_) | |
35 | |
36 .. _gd_snp: ./static/formatHelp.html#gd_snp | |
37 .. _gd_indivs: ./static/formatHelp.html#gd_indivs | |
38 .. _Dataset missing?: ./static/formatHelp.html | |
39 | |
40 ----- | |
41 | |
42 **What it does** | |
43 | |
44 The user specifies that some of the individuals in a gd_snp dataset form a | |
45 "population", by supplying a list that has been previously created using the | |
46 Specify Individuals tool. The program appends a | |
47 new "entity" (set of four columns) to the gd_snp table, analogous to the columns | |
48 for an individual but containing summary data for the population as a group. | |
49 These four columns give the total counts for the two alleles, the "genotype" for | |
50 the population, and the maximum quality value, taken over all individuals in the | |
51 population. If all defined genotypes in the population are 2 (agree with the | |
52 reference), then the population's genotype is 2, and similarly for 0; otherwise | |
53 the genotype is 1 (unless all individuals have undefined genotype, in which case | |
54 it is -1). | |
55 | |
56 ----- | |
57 | |
58 **Example** | |
59 | |
60 - input gd_snp:: | |
61 | |
62 Contig161_chr1_4641264_4641879 115 C T 73.5 chr1 4641382 C 6 0 2 45 8 0 2 51 15 0 2 72 5 0 2 42 6 0 2 45 10 0 2 57 Y 54 0.323 0 | |
63 Contig48_chr1_10150253_10151311 11 A G 94.3 chr1 10150264 A 1 0 2 30 1 0 2 30 1 0 2 30 3 0 2 36 1 0 2 30 1 0 2 30 Y 22 +99. 0 | |
64 Contig20_chr1_21313469_21313570 66 C T 54.0 chr1 21313534 C 4 0 2 39 4 0 2 39 5 0 2 42 4 0 2 39 4 0 2 39 5 0 2 42 N 1 +99. 0 | |
65 etc. | |
66 | |
67 - input individuals:: | |
68 | |
69 9 PB1 | |
70 13 PB2 | |
71 17 PB3 | |
72 | |
73 - output:: | |
74 | |
75 Contig161_chr1_4641264_4641879 115 C T 73.5 chr1 4641382 C 6 0 2 45 8 0 2 51 15 0 2 72 5 0 2 42 6 0 2 45 10 0 2 57 Y 54 0.323 0 29 0 2 72 | |
76 Contig48_chr1_10150253_10151311 11 A G 94.3 chr1 10150264 A 1 0 2 30 1 0 2 30 1 0 2 30 3 0 2 36 1 0 2 30 1 0 2 30 Y 22 +99. 0 3 0 2 30 | |
77 Contig20_chr1_21313469_21313570 66 C T 54.0 chr1 21313534 C 4 0 2 39 4 0 2 39 5 0 2 42 4 0 2 39 4 0 2 39 5 0 2 42 N 1 +99. 0 13 0 2 42 | |
78 etc. | |
79 | |
80 </help> | |
81 </tool> |