annotate tools/stats/cor.xml @ 0:9071e359b9a3

Uploaded
author xuebing
date Fri, 09 Mar 2012 19:37:19 -0500
parents
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
1 <tool id="cor2" name="Correlation">
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
2 <description>for numeric columns</description>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
3 <command interpreter="python">cor.py $input1 $out_file1 $numeric_columns $method</command>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
4 <inputs>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
5 <param format="tabular" name="input1" type="data" label="Dataset" help="Dataset missing? See TIP below"/>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
6 <param name="numeric_columns" label="Numerical columns" type="data_column" numerical="True" multiple="True" data_ref="input1" help="Multi-select list - hold the appropriate key while clicking to select multiple columns" />
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
7 <param name="method" type="select" label="Method">
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
8 <option value="pearson">Pearson</option>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
9 <option value="kendall">Kendall rank</option>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
10 <option value="spearman">Spearman rank</option>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
11 </param>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
12 </inputs>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
13 <outputs>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
14 <data format="txt" name="out_file1" />
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
15 </outputs>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
16 <requirements>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
17 <requirement type="python-module">rpy</requirement>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
18 </requirements>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
19 <tests>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
20 <!--
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
21 Test a tabular input with the first line being a comment without a # character to start
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
22 -->
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
23 <test>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
24 <param name="input1" value="cor.tabular" />
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
25 <param name="numeric_columns" value="2,3" />
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
26 <param name="method" value="pearson" />
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
27 <output name="out_file1" file="cor_out.txt" />
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
28 </test>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
29 </tests>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
30 <help>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
31
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
32 .. class:: infomark
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
33
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
34 **TIP:** If your data is not TAB delimited, use *Text Manipulation-&gt;Convert*
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
35
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
36 .. class:: warningmark
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
37
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
38 Missing data ("nan") removed from each pairwise comparison
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
39
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
40 -----
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
41
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
42 **Syntax**
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
43
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
44 This tool computes the matrix of correlation coefficients between numeric columns.
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
45
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
46 - All invalid, blank and comment lines are skipped when performing computations. The number of skipped lines is displayed in the resulting history item.
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
47
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
48 - **Pearson's Correlation** reflects the degree of linear relationship between two variables. It ranges from +1 to -1. A correlation of +1 means that there is a perfect positive linear relationship between variables. The formula for Pearson's correlation is:
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
49
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
50 .. image:: ./static/images/pearson.png
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
51
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
52 where n is the number of items
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
53
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
54 - **Kendall's rank correlation** is used to measure the degree of correspondence between two rankings and assessing the significance of this correspondence. The formula for Kendall's rank correlation is:
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
55
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
56 .. image:: ./static/images/kendall.png
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
57
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
58 where n is the number of items, and P is the sum.
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
59
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
60 - **Spearman's rank correlation** assesses how well an arbitrary monotonic function could describe the relationship between two variables, without making any assumptions about the frequency distribution of the variables. The formula for Spearman's rank correlation is
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
61
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
62 .. image:: ./static/images/spearman.png
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
63
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
64 where D is the difference between the ranks of corresponding values of X and Y, and N is the number of pairs of values.
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
65
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
66 -----
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
67
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
68 **Example**
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
69
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
70 - Input file::
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
71
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
72 #Person Height Self Esteem
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
73 1 68 4.1
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
74 2 71 4.6
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
75 3 62 3.8
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
76 4 75 4.4
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
77 5 58 3.2
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
78 6 60 3.1
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
79 7 67 3.8
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
80 8 68 4.1
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
81 9 71 4.3
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
82 10 69 3.7
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
83 11 68 3.5
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
84 12 67 3.2
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
85 13 63 3.7
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
86 14 62 3.3
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
87 15 60 3.4
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
88 16 63 4.0
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
89 17 65 4.1
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
90 18 67 3.8
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
91 19 63 3.4
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
92 20 61 3.6
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
93
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
94 - Computing the correlation coefficients between columns 2 and 3 of the above file (using Pearson's Correlation), the output is::
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
95
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
96 1.0 0.730635686279
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
97 0.730635686279 1.0
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
98
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
99 So the correlation for our twenty cases is .73, which is a fairly strong positive relationship.
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
100 </help>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
101 </tool>