annotate tools/stats/gsummary.xml @ 2:c2a356708570

Uploaded
author xuebing
date Fri, 09 Mar 2012 19:45:42 -0500
parents 9071e359b9a3
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
1 <tool id="Summary_Statistics1" name="Summary Statistics" version="1.1.0">
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
2 <description>for any numerical column</description>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
3 <command interpreter="python">gsummary.py $input $out_file1 "$cond"</command>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
4 <inputs>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
5 <param format="tabular" name="input" type="data" label="Summary statistics on" help="Dataset missing? See TIP below"/>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
6 <param name="cond" size="30" type="text" value="c5" label="Column or expression" help="See syntax below">
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
7 <validator type="empty_field" message="Enter a valid column or expression, see syntax below for examples"/>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
8 </param>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
9 </inputs>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
10 <outputs>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
11 <data format="tabular" name="out_file1" />
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
12 </outputs>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
13 <requirements>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
14 <requirement type="python-module">rpy</requirement>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
15 </requirements>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
16 <tests>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
17 <test>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
18 <param name="input" value="1.bed"/>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
19 <output name="out_file1" file="gsummary_out1.tabular"/>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
20 <param name="cond" value="c2"/>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
21 </test>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
22 </tests>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
23 <help>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
24
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
25 .. class:: warningmark
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
26
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
27 This tool expects input datasets consisting of tab-delimited columns (blank or comment lines beginning with a # character are automatically skipped).
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
28
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
29 .. class:: infomark
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
30
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
31 **TIP:** If your data is not TAB delimited, use *Text Manipulation-&gt;Convert delimiters to TAB*
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
32
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
33 .. class:: infomark
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
34
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
35 **TIP:** Computing summary statistics may throw exceptions if the data value in every line of the columns being summarized is not numerical. If a line is missing a value or contains a non-numerical value in the column being summarized, that line is skipped and the value is not included in the statistical computation. The number of invalid skipped lines is documented in the resulting history item.
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
36
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
37 .. class:: infomark
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
38
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
39 **USING R FUNCTIONS:** Most functions (like *abs*) take only a single expression. *log* can take one or two parameters, like *log(expression,base)*
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
40
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
41 Currently, these R functions are supported: *abs, sign, sqrt, floor, ceiling, trunc, round, signif, exp, log, cos, sin, tan, acos, asin, atan, cosh, sinh, tanh, acosh, asinh, atanh, lgamma, gamma, gammaCody, digamma, trigamma, cumsum, cumprod, cummax, cummin*
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
42
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
43 -----
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
44
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
45 **Syntax**
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
46
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
47 This tool computes basic summary statistics on a given column, or on a valid expression containing one or more columns.
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
48
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
49 - Columns are referenced with **c** and a **number**. For example, **c1** refers to the first column of a tab-delimited file.
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
50
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
51 - For example:
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
52
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
53 - **log(c5)** calculates the summary statistics for the natural log of column 5
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
54 - **(c5 + c6 + c7) / 3** calculates the summary statistics on the average of columns 5-7
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
55 - **log(c5,10)** summary statistics of the base 10 log of column 5
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
56 - **sqrt(c5+c9)** summary statistics of the square root of column 5 + column 9
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
57
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
58 -----
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
59
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
60 **Examples**
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
61
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
62 - Input Dataset::
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
63
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
64 c1 c2 c3 c4 c5 c6
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
65 586 chrX 161416 170887 41108_at 16990
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
66 73 chrX 505078 532318 35073_at 1700
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
67 595 chrX 1361578 1388460 33665_s_at 1960
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
68 74 chrX 1420620 1461919 1185_at 8600
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
69
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
70 - Summary Statistics on column c6 of the above input dataset::
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
71
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
72 #sum mean stdev 0% 25% 50% 75% 100%
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
73 29250.000 7312.500 7198.636 1700.000 1895.000 5280.000 10697.500 16990.000
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
74
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
75 </help>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
76 </tool>