0
|
1 <tool id="GSAR" name="GSAR" version="0.1.0">
|
|
2 <description>A set of multivariate statistical tests for self-contained gene set analysis</description>
|
|
3
|
|
4 <requirements>
|
|
5 <requirement type="package" version="1.24.0">bioconductor-GSAR</requirement>
|
|
6 <requirement type="package" version="1.52.1">bioconductor-GSEABase</requirement>
|
|
7 <requirement type="package" version="1.20.3">r-getopt</requirement>
|
|
8 </requirements>
|
|
9
|
|
10 <command detect_errors="exit_code"><![CDATA[
|
|
11 Rscript '$__tool_directory__/GSAR.R'
|
|
12 --expr_file '$expression_data_file'
|
|
13 --geneSet_file '$geneSet'
|
|
14 --design_file '$desigin'
|
|
15 --min_size '$adv.min_size'
|
|
16 --max_size '$adv.max_size'
|
|
17 --test_method '$method'
|
|
18 --nperm_number '$adv.perm_num'
|
|
19 --threshold_value '$MST.threshold'
|
|
20 --cor_method '$MST.cor_method'
|
|
21 --GSAR_output_p_value '$GSAR_p_value_for_the_geneSet'
|
|
22 --GSAR_output_plot '$GSAR_Significant_pathway_plot'
|
|
23 ]]></command>
|
|
24
|
|
25 <inputs>
|
|
26 <param name="expression_data_file" type="data" format="CSV" label="Expression data file" help="A csv file containing a matrix of expression values where rows correspond to genes (symbol ID) and columns correspond to samples."/>
|
|
27 <param name="desigin" type="data" format="CSV" label="Design" help="A csv file containing two columns corresponding to samples, one is 'group' (which sets 1 for group1 and 2 for group2), the other one is 'label' (to set group1 and group2 name/label)."/>
|
|
28 <param name="geneSet" type="data" format="rdata" label="Gene Set" help="An `rdata` file including a geneSetCollection object with 'geneSet' as name."/>
|
|
29 <param name="method" type="select" label="Method" help="Statistical method for testing the gene sets.">
|
|
30 <option value="GSNCAtest" selected="true">Gene sets net correlations analysis</option>
|
|
31 <option value="WWtest">Wald-Wolfowitz test</option>
|
|
32 <option value="KStest">Kolmogorov-Smirnov test</option>
|
|
33 <option value="MDtest">Mean Deviation tests</option>
|
|
34 <option value="RKStest">Radial Kolmogorov-Smirnov test</option>
|
|
35 <option value="RMDtest">Radial Mean Deviation test</option>
|
|
36 </param>
|
|
37
|
|
38 <section name="adv" title="Advanced options">
|
|
39 <param name="min_size" type="integer" value="10" min="5" label="Min Size for the GeneSet" help="The minimum allowed gene set size. Default value is 10." />
|
|
40 <param name="max_size" type="integer" value="500" label="Max Size for the GeneSet" help="The maximum allowed gene set size. Default value is 500." />
|
|
41 <param name="perm_num" type="integer" value="1000" min="100" label="Permutations number" help="Number of permutations used to estimate the null distribution of the test statistic. Default value is 1000. The minumum value is 100." />
|
|
42 </section>
|
|
43
|
|
44 <section name="MST" title="Option for plotting minimum spanning trees" >
|
|
45 <param name="threshold" type="float" value="0.05" min="0.0001" max="1" label="Threshold value" help="Threshold value to define significant geneSet for plot minimum spanning trees. Default is 0.05." />
|
|
46 <param name="cor_method" type="select" label="Correlation coefficient statistic" help="Correlation coefficient is computed while plotting minimum spanning trees for a pathway in two conditions. Possible values are 'pearson', 'spearman' and 'kendall'. Default value is 'pearson'. " >
|
|
47 <option value="pearson" selected="true">pearson</option>
|
|
48 <option value="spearman">spearman</option>
|
|
49 <option value="kendall">kendall</option>
|
|
50 </param>
|
|
51 </section>
|
|
52
|
|
53 </inputs>
|
|
54
|
|
55 <outputs>
|
|
56 <data name="GSAR_p_value_for_the_geneSet" format="CSV" label="GSAR_p_value_for_the_geneSet" />
|
|
57 <data name="GSAR_Significant_pathway_plot" format="pdf" label="GSAR_Significant_pathway_plot" />
|
|
58 </outputs>
|
|
59
|
|
60 <tests>
|
|
61 <test>
|
|
62 <param name="expression_data_file" value="GSAR_input_p53DataSet.csv" ftype="csv" />
|
|
63 <param name="desigin" value="GSAR_design.csv" ftype="csv" />
|
|
64 <param name="method" value="GSNCAtest" />
|
|
65 <section name="adv">
|
|
66 <param name="min_size" value="10" />
|
|
67 <param name="max_size" value="500" />
|
|
68 <param name="perm_num" value="1000"/>
|
|
69 </section>
|
|
70 <section name="MST">
|
|
71 <param name="threshold" value="0.05" />
|
|
72 <param name="cor_method" value="pearson" />
|
|
73 </section>
|
|
74 <output name="GSAR_p_value_for_the_geneSet" file="GSAR_p_value_for_the_geneSet.csv" ftype="csv" />
|
|
75 <output name="GSAR_Significant_pathway_plot" file="GSAR_Significant_pathway_plot.pdf" ftype="pdf" />
|
|
76 </test>
|
|
77 </tests>
|
|
78
|
|
79 <help><![CDATA[
|
|
80
|
|
81 .. class:: infomark
|
|
82
|
|
83 **What it does**
|
|
84
|
|
85 **GSAR (Gene Set Analysis in R)** is an R package which provides a set of multivariate statistical tests for self-contained gene set analysis (GSA). GSAR consists of two-sample multivariate nonparametric statistical methods testing a null hypothesis against specific alternative hypotheses, such as differences in mean (shift), variance (scale) or correlation structure. It also offers a graphical visualization tool for the correlation networks obtained from expression data to examine the change in the net correlation structure of a gene set between two conditions based on the minimum spanning trees.
|
|
86
|
|
87 ---------
|
|
88
|
|
89 =========
|
|
90 **Input**
|
|
91 =========
|
|
92
|
|
93 **Gene expression data**
|
|
94
|
|
95 The input is a csv file including a matrix of expression values where rows correspond to genes and columns correspond to samples.
|
|
96 Recommended gene id is `Symbol ID`.
|
|
97
|
|
98 **Design**
|
|
99
|
|
100 A csv file that has two columns correspond to samples, one is `'group'` (which sets 1 for group1 and 2 for group2), the other one is `'label'` (to set group1 and group2 name/label).
|
|
101
|
|
102 Example:
|
|
103
|
|
104 ======= ======= =========
|
|
105 sample group label
|
|
106 ======= ======= =========
|
|
107 WT1 1 control
|
|
108 WT2 1 control
|
|
109 WT3 1 control
|
|
110 ... ... ...
|
|
111 MUT31 2 test
|
|
112 MUT32 2 test
|
|
113 MUT33 2 test
|
|
114 ======= ======= =========
|
|
115
|
|
116 **Gene Sets**
|
|
117
|
|
118 **Gene Sets** is an `rdata` file including a `geneSet` variable that is a `geneSetCollection` object built by the `GSEABase` bioconductor package. You can use the **GeneSet from Msigdb/KEGG** tool to get this file. You must pay attention to set the same gene id type as in the gene expression dataset.
|
|
119
|
|
120 **Method**
|
|
121
|
|
122 Statistical method to use for testing the gene sets. Must be one of *GSNCA (Gene sets net correlations analysis)*, Wald-Wolfowitz test, Kolmogorov-Smirnov test, Mean Deviation test, Radial Kolmogorov-Smirnov test and Radial Mean Deviation test.
|
|
123
|
|
124 **Min Size for the Gene Set**
|
|
125
|
|
126 The minimum allowed gene set size. Default value is 10.
|
|
127
|
|
128 **Max Size for the Gene Set**
|
|
129
|
|
130 The maximum allowed gene set size. Default value is 500.
|
|
131
|
|
132 **Permutations number**
|
|
133
|
|
134 Number of permutations used to estimate the null distribution of the test statistic. Default value is 1000. The minumum value is 100.
|
|
135
|
|
136 **Threshold value**
|
|
137
|
|
138 Threshold value to define significant geneSet for plotting minimum spanning trees. Default as 0.05.
|
|
139
|
|
140 **Correlation coefficient statistic**
|
|
141
|
|
142 Correlation coefficient is computed to plot minimum spanning trees for a pathway in two conditions. Possible values are 'pearson' (default), 'spearman' and 'kendall'. Default value is 'pearson'.
|
|
143
|
|
144 ---------
|
|
145
|
|
146 ==========
|
|
147 **Output**
|
|
148 ==========
|
|
149
|
|
150 **1. A csv file containing the P-values of all gene sets**
|
|
151
|
|
152 Example
|
|
153
|
|
154 ========= ==========
|
|
155 geneSet p_value
|
|
156 ========= ==========
|
|
157 pathway_1 0.007
|
|
158 pathway_2 0.008
|
|
159 pathway_3 0.009
|
|
160 pathway_4 0.010
|
|
161 ... ...
|
|
162 pathway_n 0.999
|
|
163 ========= ==========
|
|
164
|
|
165 **2. Plot of minimum spanning trees for significant gene sets in two conditions**
|
|
166
|
|
167 ]]></help>
|
|
168
|
|
169 <citations>
|
|
170 <citation type="doi">10.1186/s12859-017-1482-6</citation>
|
|
171 </citations>
|
|
172
|
|
173 </tool> |