0
|
1 <tool id="prep_data" name="PreProcess Expression Data" version="1.3">
|
|
2 <description>Combines gene expression data</description>
|
|
3 <command interpreter="perl">PreProcess_Expression_Data.pl -i $data -c $compslist -a $annot -o $output -p $thresh -l $log -v n</command>
|
|
4 <inputs>
|
|
5 <param format="txt" name="data" type="data" label="Expression data"/>
|
|
6 <param format="txt" name="compslist" type="data" label="Comparison list"/>
|
|
7 <param format="txt" name="annot" type="data" label="Annotation file"/>
|
|
8 <param name="thresh" type="float" min="0" max="1" label="Percentile threshold" value="0.63" optional="true"/>
|
|
9 <param name="log" type="select" label="Log transform data?">
|
|
10 <option value="n" selected="true">No</option>
|
|
11 <option value="y">Yes</option>
|
|
12 </param>
|
|
13 </inputs>
|
|
14 <outputs>
|
|
15 <data format="txt" name="output"/>
|
|
16 </outputs>
|
|
17
|
|
18 <tests>
|
|
19 <test>
|
|
20 <param name="data" value="EMBER/expression.txt"/>
|
|
21 <param name="compslist" value="EMBER/comparisons_list.txt"/>
|
|
22 <param name="annot" value="EMBER/annotation.txt"/>
|
|
23 <param name="thresh" value="0.63"/>
|
|
24 <param name="log" value="n"/>
|
|
25 <output name="output" file="EMBER/expression_profiles.txt"/>
|
|
26 </test>
|
|
27 </tests>
|
|
28
|
|
29 <help>
|
|
30
|
|
31 This tool discretizes the gene expression data and adds genomic annotations.
|
|
32
|
|
33 More options for the EMBER tools (especially for the main program, EMBER, including searching for multiple expression patterns) are available in the command line version, available at http://dinner-group.uchicago.edu/downloads.html. That package also includes test data and sample outputs.
|
|
34
|
|
35 When using any of the EMBER tools, please cite: M Maienschein-Cline, J Zhou, KP White, R Sciammas, and AR Dinner. Discovering transcription factor regulatory targets using gene expression and binding data. *Bioinformatics*, 28:206-213 (2012).
|
|
36
|
|
37 -----
|
|
38
|
|
39 Description of inputs:
|
|
40
|
|
41 *Expression Data*:
|
|
42
|
|
43 Microarray data, with data from N experiments (and at least 2 replicates per condition).
|
|
44
|
|
45 *Format (N+1 columns)*: [ID] [expt 1 value] [expt 2 value] ... [expt N value]
|
|
46
|
|
47 IMPORTANT: the first line should be a title line, first field "#ID", and subsequent fields giving the condition/replicate for each column, i.e.,
|
|
48
|
|
49 #ID [condition]#[replicate]...
|
|
50
|
|
51 where [condition] matches the values in the Comparison List, and replicate tells which number the file is. [condition] and [replicate] are delimited by a "#" (so don't use that character in the condition name).
|
|
52
|
|
53 *Comparison List*:
|
|
54
|
|
55 List of behavior dimension definitions. [condition] should match the names in the expression data list.
|
|
56
|
|
57 *Format (2 columns)*: [condition1] [condition2]
|
|
58
|
|
59 *Annotation File*:
|
|
60
|
|
61 Gives the genomic coordinates of each probe set.
|
|
62
|
|
63 *Format (6 columns)*: [probe id] [gene name] [chromosome] [start] [end] [strand]
|
|
64
|
|
65 *Percentile Threshold* (p):
|
|
66
|
|
67 Used to eliminate genes that are consistently expressed at a very low level. All data are concatenated into one list, and the pth percentile of that list is taken as the thresold. Then a probe set is removed if its value is less than the threshold in ALL conditions.
|
|
68
|
|
69 p = 1.0 means all probes are retained, p = 0.0 means none are. However, note that this does NOT necessarily imply that 0.63 means 63% of probe sets are retained.
|
|
70
|
|
71 *Log Transform*: whether or not to take the log of the data before discretization.
|
|
72
|
|
73 </help>
|
|
74
|
|
75 </tool>
|
|
76
|