annotate tools/stats/grouping.xml @ 2:c2a356708570

Uploaded
author xuebing
date Fri, 09 Mar 2012 19:45:42 -0500
parents 9071e359b9a3
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
1 <tool id="Grouping1" name="Group" version="2.0.0">
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
2 <description>data by a column and perform aggregate operation on other columns.</description>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
3 <command interpreter="python">
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
4 grouping.py
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
5 $out_file1
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
6 $input1
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
7 $groupcol
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
8 $ignorecase
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
9 #for $op in $operations
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
10 '${op.optype}
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
11 ${op.opcol}
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
12 ${op.opround}'
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
13 #end for
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
14 </command>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
15 <inputs>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
16 <param format="tabular" name="input1" type="data" label="Select data" help="Dataset missing? See TIP below."/>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
17 <param name="groupcol" label="Group by column" type="data_column" data_ref="input1" />
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
18 <param name="ignorecase" type="boolean" truevalue="1" falsevalue="0">
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
19 <label>Ignore case while grouping?</label>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
20 </param>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
21 <repeat name="operations" title="Operation">
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
22 <param name="optype" type="select" label="Type">
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
23 <option value="mean">Mean</option>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
24 <option value="median">Median</option>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
25 <option value="mode">Mode</option>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
26 <option value="max">Maximum</option>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
27 <option value="min">Minimum</option>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
28 <option value="sum">Sum</option>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
29 <option value="length">Count</option>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
30 <option value="unique">Count Distinct</option>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
31 <option value="cat">Concatenate</option>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
32 <option value="cat_uniq">Concatenate Distinct</option>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
33 <option value="random">Randomly pick</option>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
34 <option value="std">Standard deviation</option>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
35 </param>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
36 <param name="opcol" label="On column" type="data_column" data_ref="input1" />
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
37 <param name="opround" type="select" label="Round result to nearest integer?">
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
38 <option value="no">NO</option>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
39 <option value="yes">YES</option>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
40 </param>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
41 </repeat>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
42 </inputs>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
43 <outputs>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
44 <data format="tabular" name="out_file1" />
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
45 </outputs>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
46 <requirements>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
47 <requirement type="python-module">numpy</requirement>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
48 </requirements>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
49 <tests>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
50 <!-- Test valid data -->
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
51 <test>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
52 <param name="input1" value="1.bed"/>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
53 <param name="groupcol" value="1"/>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
54 <param name="ignorecase" value="true"/>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
55 <param name="optype" value="mean"/>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
56 <param name="opcol" value="2"/>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
57 <param name="opround" value="no"/>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
58 <output name="out_file1" file="groupby_out1.dat"/>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
59 </test>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
60 <!-- Long case but test framework doesn't allow yet
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
61 <test>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
62 <param name="input1" value="1.bed"/>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
63 <param name="groupcol" value="1"/>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
64 <param name="ignorecase" value="false"/>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
65 <param name="operations" value='[{"opcol": "2", "__index__": 0, "optype": "mean", "opround": "no"}, {"opcol": "2", "__index__": 1, "optype": "median", "opround": "no"}, {"opcol": "6", "__index__": 2, "optype": "mode", "opround": "no"}, {"opcol": "2", "__index__": 3, "optype": "max", "opround": "no"}, {"opcol": "2", "__index__": 4, "optype": "min", "opround": "no"}, {"opcol": "2", "__index__": 5, "optype": "sum", "opround": "no"}, {"opcol": "1", "__index__": 6, "optype": "length", "opround": "no"}, {"opcol": "1", "__index__": 7, "optype": "unique", "opround": "no"}, {"opcol": "1", "__index__": 8, "optype": "cat", "opround": "no"}, {"opcol": "6", "__index__": 9, "optype": "cat_uniq", "opround": "no"}, {"opcol": "2", "__index__": 10, "optype": "random", "opround": "no"}, {"opcol": "2", "__index__": 11, "optype": "std", "opround": "no"}]'/>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
66 <output name="out_file1" file="groupby_out3.tabular"/>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
67 </test>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
68 -->
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
69 <!-- Test data with an invalid value in a column. Can't do it because test framework doesn't allow testing of errors
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
70 <test>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
71 <param name="input1" value="1.tabular"/>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
72 <param name="groupcol" value="1"/>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
73 <param name="ignorecase" value="true"/>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
74 <param name="optype" value="mean"/>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
75 <param name="opcol" value="2"/>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
76 <param name="opround" value="no"/>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
77 <output name="out_file1" file="groupby_out2.dat"/>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
78 </test>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
79 -->
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
80 </tests>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
81 <help>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
82
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
83 .. class:: infomark
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
84
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
85 **TIP:** If your data is not TAB delimited, use *Text Manipulation-&gt;Convert*
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
86
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
87 -----
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
88
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
89 **Syntax**
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
90
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
91 This tool allows you to group the input dataset by a particular column and perform aggregate functions: Mean, Median, Mode, Sum, Max, Min, Count, Concatenate, and Randomly pick on any column(s).
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
92
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
93 The Concatenate function will take, for each group, each item in the specified column and build a comma delimited list. Concatenate Unique will do the same but will build a list of unique items with no repetition.
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
94
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
95 Count and Count Unique are equivalent to Concatenate and Concatenate Unique, but will only count the number of items and will return an integer.
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
96
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
97 - If multiple modes are present, all are reported.
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
98
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
99 -----
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
100
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
101 **Example**
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
102
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
103 - For the following input::
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
104
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
105 chr22 1000 1003 TTT
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
106 chr22 2000 2003 aaa
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
107 chr10 2200 2203 TTT
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
108 chr10 1200 1203 ttt
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
109 chr22 1600 1603 AAA
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
110
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
111 - **Grouping on column 4** while ignoring case, and performing operation **Count on column 1** will return::
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
112
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
113 AAA 2
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
114 TTT 3
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
115
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
116 - **Grouping on column 4** while not ignoring case, and performing operation **Count on column 1** will return::
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
117
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
118 aaa 1
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
119 AAA 1
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
120 ttt 1
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
121 TTT 2
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
122 </help>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
123 </tool>