Mercurial > repos > mmaiensc > ember
diff GALAXY_FILES/tools/EMBER/PreProcess_Expression_Data.xml @ 0:003f802d4c7d
Uploaded
author | mmaiensc |
---|---|
date | Wed, 29 Feb 2012 15:03:33 -0500 |
parents | |
children |
line wrap: on
line diff
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/GALAXY_FILES/tools/EMBER/PreProcess_Expression_Data.xml Wed Feb 29 15:03:33 2012 -0500 @@ -0,0 +1,76 @@ +<tool id="prep_data" name="PreProcess Expression Data" version="1.3"> + <description>Combines gene expression data</description> + <command interpreter="perl">PreProcess_Expression_Data.pl -i $data -c $compslist -a $annot -o $output -p $thresh -l $log -v n</command> + <inputs> + <param format="txt" name="data" type="data" label="Expression data"/> + <param format="txt" name="compslist" type="data" label="Comparison list"/> + <param format="txt" name="annot" type="data" label="Annotation file"/> + <param name="thresh" type="float" min="0" max="1" label="Percentile threshold" value="0.63" optional="true"/> + <param name="log" type="select" label="Log transform data?"> + <option value="n" selected="true">No</option> + <option value="y">Yes</option> + </param> + </inputs> + <outputs> + <data format="txt" name="output"/> + </outputs> + + <tests> + <test> + <param name="data" value="EMBER/expression.txt"/> + <param name="compslist" value="EMBER/comparisons_list.txt"/> + <param name="annot" value="EMBER/annotation.txt"/> + <param name="thresh" value="0.63"/> + <param name="log" value="n"/> + <output name="output" file="EMBER/expression_profiles.txt"/> + </test> + </tests> + + <help> + +This tool discretizes the gene expression data and adds genomic annotations. + +More options for the EMBER tools (especially for the main program, EMBER, including searching for multiple expression patterns) are available in the command line version, available at http://dinner-group.uchicago.edu/downloads.html. That package also includes test data and sample outputs. + +When using any of the EMBER tools, please cite: M Maienschein-Cline, J Zhou, KP White, R Sciammas, and AR Dinner. Discovering transcription factor regulatory targets using gene expression and binding data. *Bioinformatics*, 28:206-213 (2012). + +----- + +Description of inputs: + +*Expression Data*: + + Microarray data, with data from N experiments (and at least 2 replicates per condition). + + *Format (N+1 columns)*: [ID] [expt 1 value] [expt 2 value] ... [expt N value] + + IMPORTANT: the first line should be a title line, first field "#ID", and subsequent fields giving the condition/replicate for each column, i.e., + + #ID [condition]#[replicate]... + + where [condition] matches the values in the Comparison List, and replicate tells which number the file is. [condition] and [replicate] are delimited by a "#" (so don't use that character in the condition name). + +*Comparison List*: + + List of behavior dimension definitions. [condition] should match the names in the expression data list. + + *Format (2 columns)*: [condition1] [condition2] + +*Annotation File*: + + Gives the genomic coordinates of each probe set. + + *Format (6 columns)*: [probe id] [gene name] [chromosome] [start] [end] [strand] + +*Percentile Threshold* (p): + + Used to eliminate genes that are consistently expressed at a very low level. All data are concatenated into one list, and the pth percentile of that list is taken as the thresold. Then a probe set is removed if its value is less than the threshold in ALL conditions. + + p = 1.0 means all probes are retained, p = 0.0 means none are. However, note that this does NOT necessarily imply that 0.63 means 63% of probe sets are retained. + +*Log Transform*: whether or not to take the log of the data before discretization. + + </help> + +</tool> +