annotate GALAXY_FILES/tools/EMBER/PreProcess_Expression_Data.xml @ 3:037c3edda16e

Uploaded
author mmaiensc
date Thu, 22 Mar 2012 13:49:52 -0400
parents
children e960969a92ae
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
3
037c3edda16e Uploaded
mmaiensc
parents:
diff changeset
1 <tool id="prep_data" name="PreProcess Expression Data" version="1.3.1">
037c3edda16e Uploaded
mmaiensc
parents:
diff changeset
2 <description>Step 1 of analysis: discretizes expression data</description>
037c3edda16e Uploaded
mmaiensc
parents:
diff changeset
3 <command interpreter="perl">PreProcess_Expression_Data.pl -i $data -c $compslist -a $annot -o $output -p $thresh -l $log -v n</command>
037c3edda16e Uploaded
mmaiensc
parents:
diff changeset
4 <inputs>
037c3edda16e Uploaded
mmaiensc
parents:
diff changeset
5 <param format="txt" name="data" type="data" label="Expression data"/>
037c3edda16e Uploaded
mmaiensc
parents:
diff changeset
6 <param format="txt" name="compslist" type="data" label="Comparison list"/>
037c3edda16e Uploaded
mmaiensc
parents:
diff changeset
7 <param format="txt" name="annot" type="data" label="Annotation file"/>
037c3edda16e Uploaded
mmaiensc
parents:
diff changeset
8 <param name="thresh" type="float" min="0" max="1" label="Percentile threshold" value="0.63" optional="true"/>
037c3edda16e Uploaded
mmaiensc
parents:
diff changeset
9 <param name="log" type="select" label="Log transform data?">
037c3edda16e Uploaded
mmaiensc
parents:
diff changeset
10 <option value="n" selected="true">No</option>
037c3edda16e Uploaded
mmaiensc
parents:
diff changeset
11 <option value="y">Yes</option>
037c3edda16e Uploaded
mmaiensc
parents:
diff changeset
12 </param>
037c3edda16e Uploaded
mmaiensc
parents:
diff changeset
13 </inputs>
037c3edda16e Uploaded
mmaiensc
parents:
diff changeset
14 <outputs>
037c3edda16e Uploaded
mmaiensc
parents:
diff changeset
15 <data format="txt" name="output"/>
037c3edda16e Uploaded
mmaiensc
parents:
diff changeset
16 </outputs>
037c3edda16e Uploaded
mmaiensc
parents:
diff changeset
17
037c3edda16e Uploaded
mmaiensc
parents:
diff changeset
18 <tests>
037c3edda16e Uploaded
mmaiensc
parents:
diff changeset
19 <test>
037c3edda16e Uploaded
mmaiensc
parents:
diff changeset
20 <param name="data" value="EMBER/expression.txt"/>
037c3edda16e Uploaded
mmaiensc
parents:
diff changeset
21 <param name="compslist" value="EMBER/comparisons_list.txt"/>
037c3edda16e Uploaded
mmaiensc
parents:
diff changeset
22 <param name="annot" value="EMBER/annotation.txt"/>
037c3edda16e Uploaded
mmaiensc
parents:
diff changeset
23 <param name="thresh" value="0.63"/>
037c3edda16e Uploaded
mmaiensc
parents:
diff changeset
24 <param name="log" value="n"/>
037c3edda16e Uploaded
mmaiensc
parents:
diff changeset
25 <output name="output" file="EMBER/expression_profiles.txt"/>
037c3edda16e Uploaded
mmaiensc
parents:
diff changeset
26 </test>
037c3edda16e Uploaded
mmaiensc
parents:
diff changeset
27 </tests>
037c3edda16e Uploaded
mmaiensc
parents:
diff changeset
28
037c3edda16e Uploaded
mmaiensc
parents:
diff changeset
29 <help>
037c3edda16e Uploaded
mmaiensc
parents:
diff changeset
30
037c3edda16e Uploaded
mmaiensc
parents:
diff changeset
31 This tool discretizes the gene expression data and adds genomic annotations.
037c3edda16e Uploaded
mmaiensc
parents:
diff changeset
32
037c3edda16e Uploaded
mmaiensc
parents:
diff changeset
33 More options for the EMBER tools (especially for the main program, EMBER, including searching for multiple expression patterns) are available in the command line version, available at http://dinner-group.uchicago.edu/downloads.html. That package also includes test data and sample outputs.
037c3edda16e Uploaded
mmaiensc
parents:
diff changeset
34
037c3edda16e Uploaded
mmaiensc
parents:
diff changeset
35 When using any of the EMBER tools, please cite: M Maienschein-Cline, J Zhou, KP White, R Sciammas, and AR Dinner. Discovering transcription factor regulatory targets using gene expression and binding data. *Bioinformatics*, 28:206-213 (2012).
037c3edda16e Uploaded
mmaiensc
parents:
diff changeset
36
037c3edda16e Uploaded
mmaiensc
parents:
diff changeset
37 -----
037c3edda16e Uploaded
mmaiensc
parents:
diff changeset
38
037c3edda16e Uploaded
mmaiensc
parents:
diff changeset
39 Description of inputs:
037c3edda16e Uploaded
mmaiensc
parents:
diff changeset
40
037c3edda16e Uploaded
mmaiensc
parents:
diff changeset
41 *Expression Data*:
037c3edda16e Uploaded
mmaiensc
parents:
diff changeset
42
037c3edda16e Uploaded
mmaiensc
parents:
diff changeset
43 Microarray data, with data from N experiments (and at least 2 replicates per condition).
037c3edda16e Uploaded
mmaiensc
parents:
diff changeset
44
037c3edda16e Uploaded
mmaiensc
parents:
diff changeset
45 *Format (N+1 columns)*: [ID] [expt 1 value] [expt 2 value] ... [expt N value]
037c3edda16e Uploaded
mmaiensc
parents:
diff changeset
46
037c3edda16e Uploaded
mmaiensc
parents:
diff changeset
47 IMPORTANT: the first line should be a title line, first field "#ID", and subsequent fields giving the condition/replicate for each column, i.e.,
037c3edda16e Uploaded
mmaiensc
parents:
diff changeset
48
037c3edda16e Uploaded
mmaiensc
parents:
diff changeset
49 #ID [condition]#[replicate]...
037c3edda16e Uploaded
mmaiensc
parents:
diff changeset
50
037c3edda16e Uploaded
mmaiensc
parents:
diff changeset
51 where [condition] matches the values in the Comparison List, and replicate tells which number the file is. [condition] and [replicate] are delimited by a "#" (so don't use that character in the condition name).
037c3edda16e Uploaded
mmaiensc
parents:
diff changeset
52
037c3edda16e Uploaded
mmaiensc
parents:
diff changeset
53 *Comparison List*:
037c3edda16e Uploaded
mmaiensc
parents:
diff changeset
54
037c3edda16e Uploaded
mmaiensc
parents:
diff changeset
55 List of behavior dimension definitions. [condition] should match the names in the expression data list.
037c3edda16e Uploaded
mmaiensc
parents:
diff changeset
56
037c3edda16e Uploaded
mmaiensc
parents:
diff changeset
57 *Format (2 columns)*: [condition1] [condition2]
037c3edda16e Uploaded
mmaiensc
parents:
diff changeset
58
037c3edda16e Uploaded
mmaiensc
parents:
diff changeset
59 *Annotation File*:
037c3edda16e Uploaded
mmaiensc
parents:
diff changeset
60
037c3edda16e Uploaded
mmaiensc
parents:
diff changeset
61 Gives the genomic coordinates of each probe set.
037c3edda16e Uploaded
mmaiensc
parents:
diff changeset
62
037c3edda16e Uploaded
mmaiensc
parents:
diff changeset
63 *Format (6 columns)*: [probe id] [gene name] [chromosome] [start] [end] [strand]
037c3edda16e Uploaded
mmaiensc
parents:
diff changeset
64
037c3edda16e Uploaded
mmaiensc
parents:
diff changeset
65 *Percentile Threshold* (p):
037c3edda16e Uploaded
mmaiensc
parents:
diff changeset
66
037c3edda16e Uploaded
mmaiensc
parents:
diff changeset
67 Used to eliminate genes that are consistently expressed at a very low level. All data are concatenated into one list, and the pth percentile of that list is taken as the thresold. Then a probe set is removed if its value is less than the threshold in ALL conditions.
037c3edda16e Uploaded
mmaiensc
parents:
diff changeset
68
037c3edda16e Uploaded
mmaiensc
parents:
diff changeset
69 p = 1.0 means all probes are retained, p = 0.0 means none are. However, note that this does NOT necessarily imply that 0.63 means 63% of probe sets are retained.
037c3edda16e Uploaded
mmaiensc
parents:
diff changeset
70
037c3edda16e Uploaded
mmaiensc
parents:
diff changeset
71 *Log Transform*: whether or not to take the log of the data before discretization.
037c3edda16e Uploaded
mmaiensc
parents:
diff changeset
72
037c3edda16e Uploaded
mmaiensc
parents:
diff changeset
73 A note on preparing the expression data: it may be convenient to prepare the data in an Excel worksheet, copying and pasting the expression levels from different experiments into the same file and adding titles in the first column. However, I have had some issues with then saving the file as tab-delimited text, as the line break character used by Excel is not always recognized in the text-processing routines in the scripts. The safest choice may be to select and copy the data from the open Excel worksheet and paste it into a text editor, which has worked for me.
037c3edda16e Uploaded
mmaiensc
parents:
diff changeset
74
037c3edda16e Uploaded
mmaiensc
parents:
diff changeset
75 </help>
037c3edda16e Uploaded
mmaiensc
parents:
diff changeset
76
037c3edda16e Uploaded
mmaiensc
parents:
diff changeset
77 </tool>
037c3edda16e Uploaded
mmaiensc
parents:
diff changeset
78