annotate restore_attributes.xml @ 24:248b06e86022

Added gd_genotype datatype. Modified tools to support new datatype.
author Richard Burhans <burhans@bx.psu.edu>
date Tue, 28 May 2013 16:24:19 -0400
parents 95a05c1ef5d5
children 91e835060ad2
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
22
95a05c1ef5d5 update to devshed revision aaece207bd01
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
1 <tool id="gd_restore_attributes" name="Restore Attributes" version="1.0.0">
95a05c1ef5d5 update to devshed revision aaece207bd01
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
2 <description>: Fill in missing properties for a gd_snp dataset</description>
95a05c1ef5d5 update to devshed revision aaece207bd01
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
3
95a05c1ef5d5 update to devshed revision aaece207bd01
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
4 <command interpreter="python">
95a05c1ef5d5 update to devshed revision aaece207bd01
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
5 cp.py "$dst" "$output"
95a05c1ef5d5 update to devshed revision aaece207bd01
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
6 </command>
95a05c1ef5d5 update to devshed revision aaece207bd01
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
7
95a05c1ef5d5 update to devshed revision aaece207bd01
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
8 <inputs>
95a05c1ef5d5 update to devshed revision aaece207bd01
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
9 <param name="src" type="data" format="gd_snp" label="SNP dataset to copy attributes from" />
95a05c1ef5d5 update to devshed revision aaece207bd01
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
10 <param name="dst" type="data" format="gd_snp" label="SNP dataset to receive attributes" />
95a05c1ef5d5 update to devshed revision aaece207bd01
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
11 </inputs>
95a05c1ef5d5 update to devshed revision aaece207bd01
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
12
95a05c1ef5d5 update to devshed revision aaece207bd01
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
13 <outputs>
95a05c1ef5d5 update to devshed revision aaece207bd01
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
14 <data name="output" format="gd_snp" metadata_source="src" />
95a05c1ef5d5 update to devshed revision aaece207bd01
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
15 </outputs>
95a05c1ef5d5 update to devshed revision aaece207bd01
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
16
95a05c1ef5d5 update to devshed revision aaece207bd01
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
17 <help>
95a05c1ef5d5 update to devshed revision aaece207bd01
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
18
95a05c1ef5d5 update to devshed revision aaece207bd01
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
19 **Dataset formats**
95a05c1ef5d5 update to devshed revision aaece207bd01
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
20
95a05c1ef5d5 update to devshed revision aaece207bd01
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
21 All of the input and output datasets are in gd_snp_ format. (`Dataset missing?`_)
95a05c1ef5d5 update to devshed revision aaece207bd01
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
22
95a05c1ef5d5 update to devshed revision aaece207bd01
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
23 .. _gd_snp: ./static/formatHelp.html#gd_snp
95a05c1ef5d5 update to devshed revision aaece207bd01
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
24 .. _Dataset missing?: ./static/formatHelp.html
95a05c1ef5d5 update to devshed revision aaece207bd01
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
25
95a05c1ef5d5 update to devshed revision aaece207bd01
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
26 -----
95a05c1ef5d5 update to devshed revision aaece207bd01
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
27
95a05c1ef5d5 update to devshed revision aaece207bd01
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
28 **What it does**
95a05c1ef5d5 update to devshed revision aaece207bd01
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
29
95a05c1ef5d5 update to devshed revision aaece207bd01
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
30 This tool copies metadata information from one SNP dataset to another, leaving
95a05c1ef5d5 update to devshed revision aaece207bd01
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
31 the actual SNP data itself unchanged. Datasets in gd_snp format have a number
95a05c1ef5d5 update to devshed revision aaece207bd01
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
32 of "extra" properties associated with them, such as the focus species (which
95a05c1ef5d5 update to devshed revision aaece207bd01
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
33 may be different from the reference assembly), names of individuals, column
95a05c1ef5d5 update to devshed revision aaece207bd01
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
34 numbers containing certain data fields, etc. These values are stored in the
95a05c1ef5d5 update to devshed revision aaece207bd01
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
35 dataset's metadata, in addition to the more usual attributes like dataset name,
95a05c1ef5d5 update to devshed revision aaece207bd01
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
36 assembly build, and so forth. You can see some of these by clicking on the
95a05c1ef5d5 update to devshed revision aaece207bd01
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
37 pencil icon for the dataset.
95a05c1ef5d5 update to devshed revision aaece207bd01
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
38
95a05c1ef5d5 update to devshed revision aaece207bd01
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
39 The Genome Diversity tools need this information to perform their tasks.
95a05c1ef5d5 update to devshed revision aaece207bd01
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
40 However, these additional attributes may be lost if the datatype is changed.
95a05c1ef5d5 update to devshed revision aaece207bd01
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
41 For example, suppose you want to see which SNPs overlap some other dataset in
95a05c1ef5d5 update to devshed revision aaece207bd01
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
42 your history, like coding regions or TAL1 binding sites. The Intersect tool
95a05c1ef5d5 update to devshed revision aaece207bd01
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
43 only works on datasets that are in interval format, so you might use the Compute
95a05c1ef5d5 update to devshed revision aaece207bd01
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
44 tool to append a new column with the End position of the SNP (= Start + 1),
95a05c1ef5d5 update to devshed revision aaece207bd01
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
45 then use the pencil icon to change the datatype to "interval". This works
95a05c1ef5d5 update to devshed revision aaece207bd01
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
46 great for doing the intersection, but if you then want to run one of the Genome
95a05c1ef5d5 update to devshed revision aaece207bd01
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
47 Diversity tools on the resulting SNPs, there's a problem: you can change the
95a05c1ef5d5 update to devshed revision aaece207bd01
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
48 datatype back to gd_snp easily enough, but the extra attributes have been lost
95a05c1ef5d5 update to devshed revision aaece207bd01
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
49 in the conversion to interval.
95a05c1ef5d5 update to devshed revision aaece207bd01
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
50
95a05c1ef5d5 update to devshed revision aaece207bd01
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
51 As long as the proper values of the lost attributes have not changed, then this
95a05c1ef5d5 update to devshed revision aaece207bd01
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
52 tool can restore them by copying from the old gd_snp dataset in your history.
95a05c1ef5d5 update to devshed revision aaece207bd01
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
53 In the above example, appending a column does not change the numbering of the
95a05c1ef5d5 update to devshed revision aaece207bd01
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
54 earlier columns, and deleting rows via Intersect does not affect the extra
95a05c1ef5d5 update to devshed revision aaece207bd01
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
55 attributes either. Note that all of the metadata is copied, not just the extra
95a05c1ef5d5 update to devshed revision aaece207bd01
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
56 attributes specific to gd_snp (though standard items like the assembly build,
95a05c1ef5d5 update to devshed revision aaece207bd01
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
57 the number of lines, and the name for the output dataset are updated
95a05c1ef5d5 update to devshed revision aaece207bd01
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
58 automatically by the Galaxy framework).
95a05c1ef5d5 update to devshed revision aaece207bd01
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
59
95a05c1ef5d5 update to devshed revision aaece207bd01
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
60 </help>
95a05c1ef5d5 update to devshed revision aaece207bd01
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
61 </tool>