annotate normalize.xml @ 2:f3fe0f64fe91 draft

Uploaded
author ynewton
date Mon, 17 Dec 2012 15:25:38 -0500
parents a9d8d4b531f7
children 6e77048d4d88
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
a9d8d4b531f7 Uploaded
ynewton
parents:
diff changeset
1 <tool id="matrix_normalize" name="Matrix Normalize" version="2.0.0">
a9d8d4b531f7 Uploaded
ynewton
parents:
diff changeset
2 <description>Matrix Normalize</description>
a9d8d4b531f7 Uploaded
ynewton
parents:
diff changeset
3 <command interpreter="Rscript">normalize.r $genomicMatrix $normType $normBy
a9d8d4b531f7 Uploaded
ynewton
parents:
diff changeset
4 #if str($controlColumnLabelsList) != "None":
a9d8d4b531f7 Uploaded
ynewton
parents:
diff changeset
5 $controlColumnLabelsList
a9d8d4b531f7 Uploaded
ynewton
parents:
diff changeset
6 #end if
a9d8d4b531f7 Uploaded
ynewton
parents:
diff changeset
7 > $outfile
a9d8d4b531f7 Uploaded
ynewton
parents:
diff changeset
8 </command>
a9d8d4b531f7 Uploaded
ynewton
parents:
diff changeset
9 <inputs>
a9d8d4b531f7 Uploaded
ynewton
parents:
diff changeset
10 <param name="genomicMatrix" type="data" label="Genomic Matrix"/>
a9d8d4b531f7 Uploaded
ynewton
parents:
diff changeset
11 <param name="normBy" type="select" label="normalize by (row or column)">
a9d8d4b531f7 Uploaded
ynewton
parents:
diff changeset
12 <option value="row">ROW</option>
a9d8d4b531f7 Uploaded
ynewton
parents:
diff changeset
13 <option value="column">COLUMN</option>
a9d8d4b531f7 Uploaded
ynewton
parents:
diff changeset
14 </param>
a9d8d4b531f7 Uploaded
ynewton
parents:
diff changeset
15 <param name="normType" type="select" label="type of normalization">
a9d8d4b531f7 Uploaded
ynewton
parents:
diff changeset
16 <option value="median_shift">Median Shift</option>
a9d8d4b531f7 Uploaded
ynewton
parents:
diff changeset
17 <option value="mean_shift">Mean Shift</option>
a9d8d4b531f7 Uploaded
ynewton
parents:
diff changeset
18 <option value="t_statistic">Student t-statistic (z-scores)</option>
a9d8d4b531f7 Uploaded
ynewton
parents:
diff changeset
19 <option value="exponential_fit">Exponential Distribution Normalization</option>
a9d8d4b531f7 Uploaded
ynewton
parents:
diff changeset
20 <option value="normal_fit">Normal Distribution Normalization</option>
a9d8d4b531f7 Uploaded
ynewton
parents:
diff changeset
21 <option value="weibull_0.5_fit">Weibull Distribution Normalization (scale=1,shape=0.5)</option>
a9d8d4b531f7 Uploaded
ynewton
parents:
diff changeset
22 <option value="weibull_1_fit">Weibull Distribution Normalization (scale=1,shape=1)</option>
a9d8d4b531f7 Uploaded
ynewton
parents:
diff changeset
23 <option value="weibull_1.5_fit">Weibull Distribution Normalization (scale=1,shape=1.5)</option>
a9d8d4b531f7 Uploaded
ynewton
parents:
diff changeset
24 <option value="weibull_5_fit">Weibull Distribution Normalization (scale=1,shape=5)</option>
a9d8d4b531f7 Uploaded
ynewton
parents:
diff changeset
25 </param>
a9d8d4b531f7 Uploaded
ynewton
parents:
diff changeset
26 <param name="controlColumnLabelsList" optional="true" type="data" label="Controls"/>
a9d8d4b531f7 Uploaded
ynewton
parents:
diff changeset
27 </inputs>
a9d8d4b531f7 Uploaded
ynewton
parents:
diff changeset
28 <outputs>
a9d8d4b531f7 Uploaded
ynewton
parents:
diff changeset
29 <data name="outfile" format="tabular"/>
a9d8d4b531f7 Uploaded
ynewton
parents:
diff changeset
30 </outputs>
a9d8d4b531f7 Uploaded
ynewton
parents:
diff changeset
31 <help>
a9d8d4b531f7 Uploaded
ynewton
parents:
diff changeset
32 **What it does**
a9d8d4b531f7 Uploaded
ynewton
parents:
diff changeset
33
a9d8d4b531f7 Uploaded
ynewton
parents:
diff changeset
34 This tool takes data in a matrix format and normalizes it using the chosen normalization options. The matrix data is assumed to be column and row annotated, meaning that the first line of the matrix file is assumed to be the column headers and the first column of each row is assumed to be the row header.
a9d8d4b531f7 Uploaded
ynewton
parents:
diff changeset
35
a9d8d4b531f7 Uploaded
ynewton
parents:
diff changeset
36 Data can be normalized either by row or column. Note that exponential, normal, and weibull normalizations automatically do so by column regardless of the user selection.
a9d8d4b531f7 Uploaded
ynewton
parents:
diff changeset
37
a9d8d4b531f7 Uploaded
ynewton
parents:
diff changeset
38 The following normalizations are provided:
a9d8d4b531f7 Uploaded
ynewton
parents:
diff changeset
39
a9d8d4b531f7 Uploaded
ynewton
parents:
diff changeset
40 1. Median shift: if no normals list is provided then computes the median for the whole row and subtracts it from each entry of the row. If normals are provided then computes median for normals and subtracts it from each value of non-normal. Returns only non-normal samples if normals are provided. If "Column" is selected in normalize by, then normals are ignored.
a9d8d4b531f7 Uploaded
ynewton
parents:
diff changeset
41
a9d8d4b531f7 Uploaded
ynewton
parents:
diff changeset
42 2. Mean shift: if no normals list is provided then computes the mean for the whole row and subtracts it from each entry of the row. If normals are provided then computes mean for normals and subtracts it from each value of non-normal. Returns only non-normal samples if normals are provided. If "Column" is selected in normalize by, then normals are ignored.
a9d8d4b531f7 Uploaded
ynewton
parents:
diff changeset
43
a9d8d4b531f7 Uploaded
ynewton
parents:
diff changeset
44 3. T-statistic (z-score): sometimes called standardization. Z-score is computed for each value of the row/column. If normals are specified then the z-score within each class (normals and non-normals) is computed.
a9d8d4b531f7 Uploaded
ynewton
parents:
diff changeset
45
a9d8d4b531f7 Uploaded
ynewton
parents:
diff changeset
46 4. Exponential normalization: performed by columns/samples. All genes/probes in the column/sample are ranked. Then inverse CDF (quantile function) is applied to the ranks (transforms a rank to a real number in exponential distribution).
a9d8d4b531f7 Uploaded
ynewton
parents:
diff changeset
47
a9d8d4b531f7 Uploaded
ynewton
parents:
diff changeset
48 5. Normal normalization: same as exponential normalization, but inverse quantile function of Normal distribution is applied.
a9d8d4b531f7 Uploaded
ynewton
parents:
diff changeset
49
a9d8d4b531f7 Uploaded
ynewton
parents:
diff changeset
50 6. Weibull normalizations: same as exponential normalization, but inverse quantile function of Weibull distribution is applied with appropriate scale and shape parameters.
a9d8d4b531f7 Uploaded
ynewton
parents:
diff changeset
51
a9d8d4b531f7 Uploaded
ynewton
parents:
diff changeset
52
a9d8d4b531f7 Uploaded
ynewton
parents:
diff changeset
53 Normals/controls parameter is an optional parameter which contains either a list of column headers from the input matrix which should be considered as normals/controls, or a matrix of normal/control samples. The program is smart enough to distinguish between the two cases and will automatically process the normals/controls in a correct way. When specifying both the main expression matrix and the normals/controls matrix while performing column-wise normalization, the program will actually concatenate the two matrices and produce a combined matrix which contains both tumor and normal/control samples, in which samples are normalized.
a9d8d4b531f7 Uploaded
ynewton
parents:
diff changeset
54
a9d8d4b531f7 Uploaded
ynewton
parents:
diff changeset
55 </help>
a9d8d4b531f7 Uploaded
ynewton
parents:
diff changeset
56 </tool>