Mercurial > repos > miller-lab > genome_diversity
comparison dpmix.xml @ 14:8ae67e9fb6ff
Uploaded Miller Lab Devshed version a51c894f5bed again [possible toolshed.g2 bug]
author | miller-lab |
---|---|
date | Fri, 28 Sep 2012 11:35:56 -0400 |
parents | |
children | d6b961721037 |
comparison
equal
deleted
inserted
replaced
13:fdb4240fb565 | 14:8ae67e9fb6ff |
---|---|
1 <tool id="gd_dpmix" name="Admixture" version="1.0.0"> | |
2 <description>: Map genomic intervals resembling specified ancestral populations</description> | |
3 | |
4 <command interpreter="python"> | |
5 dpmix.py "$input" "$data_source" "$switch_penalty" "$ap1_input" "$ap2_input" "$p_input" "$output" "$output2" "$output2.files_path" "$input.dataset.metadata.dbkey" "$input.dataset.metadata.ref" "$GALAXY_DATA_INDEX_DIR" "gd.heterochromatic.loc" | |
6 #for $individual, $individual_col in zip($input.dataset.metadata.individual_names, $input.dataset.metadata.individual_columns) | |
7 #set $arg = '%s:%s' % ($individual_col, $individual) | |
8 "$arg" | |
9 #end for | |
10 </command> | |
11 | |
12 <inputs> | |
13 <param name="input" type="data" format="gd_snp" label="Dataset"> | |
14 <validator type="unspecified_build" message="This dataset does not have a reference species and cannot be used with this tool" /> | |
15 </param> | |
16 <param name="ap1_input" type="data" format="gd_indivs" label="Ancestral population 1 individuals" /> | |
17 <param name="ap2_input" type="data" format="gd_indivs" label="Ancestral population 2 individuals" /> | |
18 <param name="p_input" type="data" format="gd_indivs" label="Potentially admixed individuals" /> | |
19 | |
20 <param name="data_source" type="select" format="integer" label="Data source"> | |
21 <option value="0" selected="true">sequence coverage</option> | |
22 <option value="1">estimated genotype</option> | |
23 </param> | |
24 | |
25 <param name="switch_penalty" type="integer" min="0" value="10" label="Switch penalty" /> | |
26 </inputs> | |
27 | |
28 <outputs> | |
29 <data name="output" format="tabular" /> | |
30 <data name="output2" format="html" /> | |
31 </outputs> | |
32 | |
33 <tests> | |
34 <test> | |
35 <param name="input" value="test_in/sample.gd_snp" ftype="gd_snp" /> | |
36 <param name="ap1_input" value="test_in/a.gd_indivs" ftype="gd_indivs" /> | |
37 <param name="ap2_input" value="test_in/b.gd_indivs" ftype="gd_indivs" /> | |
38 <param name="p_input" value="test_in/c.gd_indivs" ftype="gd_indivs" /> | |
39 <param name="data_source" value="0" /> | |
40 <param name="switch_penalty" value="10" /> | |
41 | |
42 <output name="output" file="test_out/dpmix/dpmix.tabular" /> | |
43 | |
44 <output name="output2" file="test_out/dpmix/dpmix.html" ftype="html" compare="diff" lines_diff="2"> | |
45 <extra_files type="file" name="dpmix.pdf" value="test_out/dpmix/dpmix.pdf" compare="sim_size" delta = "10000" /> | |
46 <extra_files type="file" name="misc.txt" value="test_out/dpmix/misc.txt" /> | |
47 </output> | |
48 </test> | |
49 </tests> | |
50 | |
51 <help> | |
52 | |
53 **Dataset formats** | |
54 | |
55 The input datasets are in gd_snp_ and gd_indivs_ formats. It is important for | |
56 the Individuals datasets to have unique names and for there to be no overlap | |
57 between the two populations. Rename these datasets if | |
58 needed to make them unique. | |
59 There are two output datasets, one tabular_ and one composite. (`Dataset missing?`_) | |
60 | |
61 .. _gd_snp: ./static/formatHelp.html#gd_snp | |
62 .. _gd_indivs: ./static/formatHelp.html#gd_indivs | |
63 .. _tabular: ./static/formatHelp.html#tab | |
64 .. _Dataset missing?: ./static/formatHelp.html | |
65 | |
66 ----- | |
67 | |
68 **What it does** | |
69 | |
70 The user specifies two "ancestral" populations (i.e., sources for | |
71 chromosomes) and a set of potentially admixed individuals, and chooses | |
72 between the sequence coverage or the estimated genotypes to measure | |
73 the similarity of genomic intervals in admixed individuals to the two | |
74 classes of ancestral chromosomes. The user also picks a "switch penalty", | |
75 typically between 10 and 100. For each potentially admixed individual, | |
76 the program divides the genome into three "genotypes": (0) homozygous | |
77 for the first ancestral population (i.e., both chromosomes from that | |
78 population), (1) heterozygous, or (2) homozygous for the second ancestral | |
79 population. Parts of a chromosome that are labeled as "heterochromatic" | |
80 are given the non-genotype, 3. Smaller values of the switch penalty | |
81 (corresponding to more ancient admixture events) generally lead to the | |
82 reconstruction of more frequent changes between genotypes. | |
83 | |
84 There are two output datasets generated. A tabular dataset with chromosome, | |
85 start, stop, and pairs of columns containing the "genotypes" from above | |
86 and label from the admixed individual. The second dataset is a composite | |
87 dataset with general information from the run and a link to a pdf which | |
88 graphically shows the ancestral population along each of the chromosomes. | |
89 The second link is to a text file with summary information of the | |
90 "genotypes" over the whole genome. | |
91 | |
92 </help> | |
93 </tool> |