Mercurial > repos > miller-lab > genome_diversity
diff restore_attributes.xml @ 22:95a05c1ef5d5
update to devshed revision aaece207bd01
author | Richard Burhans <burhans@bx.psu.edu> |
---|---|
date | Mon, 11 Mar 2013 11:28:06 -0400 |
parents | |
children | 91e835060ad2 |
line wrap: on
line diff
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/restore_attributes.xml Mon Mar 11 11:28:06 2013 -0400 @@ -0,0 +1,61 @@ +<tool id="gd_restore_attributes" name="Restore Attributes" version="1.0.0"> + <description>: Fill in missing properties for a gd_snp dataset</description> + + <command interpreter="python"> + cp.py "$dst" "$output" + </command> + + <inputs> + <param name="src" type="data" format="gd_snp" label="SNP dataset to copy attributes from" /> + <param name="dst" type="data" format="gd_snp" label="SNP dataset to receive attributes" /> + </inputs> + + <outputs> + <data name="output" format="gd_snp" metadata_source="src" /> + </outputs> + + <help> + +**Dataset formats** + +All of the input and output datasets are in gd_snp_ format. (`Dataset missing?`_) + +.. _gd_snp: ./static/formatHelp.html#gd_snp +.. _Dataset missing?: ./static/formatHelp.html + +----- + +**What it does** + +This tool copies metadata information from one SNP dataset to another, leaving +the actual SNP data itself unchanged. Datasets in gd_snp format have a number +of "extra" properties associated with them, such as the focus species (which +may be different from the reference assembly), names of individuals, column +numbers containing certain data fields, etc. These values are stored in the +dataset's metadata, in addition to the more usual attributes like dataset name, +assembly build, and so forth. You can see some of these by clicking on the +pencil icon for the dataset. + +The Genome Diversity tools need this information to perform their tasks. +However, these additional attributes may be lost if the datatype is changed. +For example, suppose you want to see which SNPs overlap some other dataset in +your history, like coding regions or TAL1 binding sites. The Intersect tool +only works on datasets that are in interval format, so you might use the Compute +tool to append a new column with the End position of the SNP (= Start + 1), +then use the pencil icon to change the datatype to "interval". This works +great for doing the intersection, but if you then want to run one of the Genome +Diversity tools on the resulting SNPs, there's a problem: you can change the +datatype back to gd_snp easily enough, but the extra attributes have been lost +in the conversion to interval. + +As long as the proper values of the lost attributes have not changed, then this +tool can restore them by copying from the old gd_snp dataset in your history. +In the above example, appending a column does not change the numbering of the +earlier columns, and deleting rows via Intersect does not affect the extra +attributes either. Note that all of the metadata is copied, not just the extra +attributes specific to gd_snp (though standard items like the assembly build, +the number of lines, and the name for the output dataset are updated +automatically by the Galaxy framework). + + </help> +</tool>