view GSAR.xml @ 1:8ff053661ae2 draft default tip

author mora-lab
date Thu, 20 May 2021 08:22:56 +0000
parents f0cad4d3a301
line wrap: on
line source

<tool id="GSAR" name="GSAR" version="0.1.0">
    <description>A set of multivariate statistical tests for self-contained gene set analysis</description>

        <requirement type="package" version="1.24.0">bioconductor-GSAR</requirement>
        <requirement type="package" version="1.52.1">bioconductor-GSEABase</requirement>
        <requirement type="package" version="1.20.3">r-getopt</requirement>

    <command detect_errors="exit_code"><![CDATA[
        Rscript '$__tool_directory__/GSAR.R' 
        --expr_file '$expression_data_file' 
        --geneSet_file '$geneSet'
        --design_file '$desigin' 
        --min_size '$adv.min_size'
        --max_size '$adv.max_size'
        --test_method '$method'
        --nperm_number '$adv.perm_num'
        --threshold_value '$MST.threshold'
        --cor_method '$MST.cor_method'
        --GSAR_output_p_value '$GSAR_p_value_for_the_geneSet'
        --GSAR_output_plot '$GSAR_Significant_pathway_plot'

        <param name="expression_data_file" type="data" format="CSV" label="Expression data file" help="A csv file containing a matrix of expression values where rows correspond to genes (symbol ID) and columns correspond to samples."/>
        <param name="desigin" type="data" format="CSV" label="Design" help="A csv file containing two columns corresponding to samples, one is 'group' (which sets 1 for group1 and 2 for group2), the other one is 'label' (to set group1 and group2 name/label)."/>
        <param name="geneSet" type="data" format="rdata" label="Gene Set" help="An `rdata` file including a geneSetCollection object with 'geneSet' as name."/>
        <param name="method" type="select" label="Method" help="Statistical method for testing the gene sets.">
            <option value="GSNCAtest" selected="true">Gene sets net correlations analysis</option>
            <option value="WWtest">Wald-Wolfowitz test</option>
            <option value="KStest">Kolmogorov-Smirnov test</option>
            <option value="MDtest">Mean Deviation tests</option>
            <option value="RKStest">Radial Kolmogorov-Smirnov test</option>
            <option value="RMDtest">Radial Mean Deviation test</option>

        <section name="adv" title="Advanced options">
            <param name="min_size" type="integer" value="10" min="5" label="Min Size for the GeneSet" help="The minimum allowed gene set size. Default value is 10." />
            <param name="max_size" type="integer" value="500" label="Max Size for the GeneSet" help="The maximum allowed gene set size. Default value is 500." />
            <param name="perm_num" type="integer" value="1000" min="100" label="Permutations number" help="Number of permutations used to estimate the null distribution of the test statistic. Default value is 1000. The minumum value is 100." />
        <section name="MST" title="Option for plotting minimum spanning trees" >
            <param name="threshold" type="float" value="0.05" min="0.0001" max="1" label="Threshold value" help="Threshold value to define significant geneSet for plot minimum spanning trees. Default is 0.05." />
            <param name="cor_method" type="select" label="Correlation coefficient statistic" help="Correlation coefficient is computed while plotting minimum spanning trees for a pathway in two conditions. Possible values are 'pearson', 'spearman' and 'kendall'. Default value is 'pearson'. " >
                <option value="pearson" selected="true">pearson</option>
                <option value="spearman">spearman</option>
                <option value="kendall">kendall</option>


        <data name="GSAR_p_value_for_the_geneSet" format="CSV" label="GSAR_p_value_for_the_geneSet" />
        <data name="GSAR_Significant_pathway_plot" format="pdf" label="GSAR_Significant_pathway_plot" />

            <param name="expression_data_file" value="GSAR_input_p53DataSet.csv" ftype="csv" />    
            <param name="desigin" value="GSAR_design.csv" ftype="csv" />
            <param name="method" value="GSNCAtest" />
            <section name="adv">
                <param name="min_size" value="10" />
                <param name="max_size" value="500" />
                <param name="perm_num" value="1000"/>
            <section name="MST">
                <param name="threshold" value="0.05" />
                <param name="cor_method" value="pearson" />
            <output name="GSAR_p_value_for_the_geneSet" file="GSAR_p_value_for_the_geneSet.csv" ftype="csv" />
            <output name="GSAR_Significant_pathway_plot" file="GSAR_Significant_pathway_plot.pdf" ftype="pdf" />

.. class:: infomark 

**What it does**
    **GSAR (Gene Set Analysis in R)** is an R package which provides a set of multivariate statistical tests for self-contained gene set analysis (GSA). GSAR consists of two-sample multivariate nonparametric statistical methods testing a null hypothesis against specific alternative hypotheses, such as differences in mean (shift), variance (scale) or correlation structure. It also offers a graphical visualization tool for the correlation networks obtained from expression data to examine the change in the net correlation structure of a gene set between two conditions based on the minimum spanning trees.



**Gene expression data** 

The input is a csv file including a matrix of expression values where rows correspond to genes and columns correspond to samples.
Recommended gene id is `Symbol ID`.


A csv file that has two columns correspond to samples, one is `'group'` (which sets 1 for group1 and 2 for group2), the other one is `'label'` (to set group1 and group2 name/label).


    ======= ======= =========
    sample  group   label
    ======= ======= =========
    WT1         1   control
    WT2         1   control
    WT3         1   control
    ...       ...   ...
    MUT31       2   test
    MUT32       2   test
    MUT33       2   test
    ======= ======= =========

**Gene Sets**

**Gene Sets** is an `rdata` file including a `geneSet` variable that is a `geneSetCollection` object built by the `GSEABase` bioconductor package. You can use the **GeneSet from Msigdb/KEGG** tool to get this file. You must pay attention to set the same gene id type as in the gene expression dataset.


Statistical method to use for testing the gene sets. Must be one of *GSNCA (Gene sets net correlations analysis)*, Wald-Wolfowitz test, Kolmogorov-Smirnov test, Mean Deviation test, Radial Kolmogorov-Smirnov test and Radial Mean Deviation test.

**Min Size for the Gene Set**

The minimum allowed gene set size. Default value is 10.

**Max Size for the Gene Set**

The maximum allowed gene set size. Default value is 500.

**Permutations number**

Number of permutations used to estimate the null distribution of the test statistic. Default value is 1000. The minumum value is 100.

**Threshold value**

Threshold value to define significant geneSet for plotting minimum spanning trees. Default as 0.05.

**Correlation coefficient statistic**

Correlation coefficient is computed to plot minimum spanning trees for a pathway in two conditions. Possible values are 'pearson' (default), 'spearman' and 'kendall'. Default value is 'pearson'. 



**1. A csv file containing the P-values of all gene sets**


    ========= ========== 
    geneSet     p_value 
    ========= ========== 
    pathway_1   0.007
    pathway_2   0.008
    pathway_3   0.009
    pathway_4   0.010
    ...         ...
    pathway_n   0.999
    ========= ========== 
**2. Plot of minimum spanning trees for significant gene sets in two conditions**

        <citation type="doi">10.1186/s12859-017-1482-6</citation>
