view pathview.xml @ 0:bd99e8fbde97 draft

"planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/pathview commit 438c20a62c01fa4adea6fbb4cb40ae05b4166043"
author iuc
date Mon, 26 Aug 2019 14:42:52 -0400
parents
children 03c13085aecc
line wrap: on
line source

<tool id="pathview" name="Pathview" version="@VERSION@+@GALAXY_VERSION@">
    <description>for pathway based data integration and visualization</description>
    <macros>
        <token name="@VERSION@">1.24.0</token>
        <token name="@GALAXY_VERSION@">galaxy0</token>
    </macros>
    <requirements>
        <requirement type="package" version="@VERSION@">bioconductor-pathview</requirement>
        <requirement type="package" version="3.8.2">bioconductor-org.ag.eg.db</requirement>
        <requirement type="package" version="3.8.2">bioconductor-org.at.tair.db</requirement>
        <requirement type="package" version="3.8.2">bioconductor-org.bt.eg.db</requirement>
        <requirement type="package" version="3.8.2">bioconductor-org.ce.eg.db</requirement>
        <requirement type="package" version="3.8.2">bioconductor-org.cf.eg.db</requirement>
        <requirement type="package" version="3.8.2">bioconductor-org.dm.eg.db</requirement>
        <requirement type="package" version="3.8.2">bioconductor-org.dr.eg.db</requirement>
        <requirement type="package" version="3.8.2">bioconductor-org.eck12.eg.db</requirement>
        <requirement type="package" version="3.8.2">bioconductor-org.ecsakai.eg.db</requirement>
        <requirement type="package" version="3.8.2">bioconductor-org.gg.eg.db</requirement>
        <requirement type="package" version="3.8.2">bioconductor-org.hs.eg.db</requirement>
        <requirement type="package" version="3.8.2">bioconductor-org.mm.eg.db</requirement>
        <requirement type="package" version="3.8.2">bioconductor-org.mmu.eg.db</requirement>
        <requirement type="package" version="3.8.2">bioconductor-org.pf.plasmo.db</requirement>
        <requirement type="package" version="3.8.2">bioconductor-org.pt.eg.db</requirement>
        <requirement type="package" version="3.8.2">bioconductor-org.rn.eg.db</requirement>
        <requirement type="package" version="3.8.2">bioconductor-org.sc.sgd.db</requirement>
        <requirement type="package" version="3.8.2">bioconductor-org.ss.eg.db</requirement>
        <requirement type="package" version="3.8.2">bioconductor-org.xl.eg.db</requirement>
        <requirement type="package" version="1.6.2">r-optparse</requirement>
    </requirements>
    <version_command><![CDATA[
echo $(R --version | grep version | grep -v GNU)", pathview version" $(R --vanilla --slave -e "library(pathview); cat(sessionInfo()\$otherPkgs\$pathview\$Version)" 2> /dev/null | grep -v -i "WARNING: ")", optparse version" $(R --vanilla --slave -e "library(optparse); cat(sessionInfo()\$otherPkgs\$optparse\$Version)" 2> /dev/null | grep -v -i "WARNING: ")", dplyr version" $(R --vanilla --slave -e "library(dplyr); cat(sessionInfo()\$otherPkgs\$dplyr\$Version)" 2> /dev/null | grep -v -i "WARNING: ")", ggplot2 version" $(R --vanilla --slave -e "library(ggplot2); cat(sessionInfo()\$otherPkgs\$ggplot2\$Version)" 2> /dev/null | grep -v -i "WARNING: ")
    ]]></version_command>
    <command detect_errors="exit_code"><![CDATA[
Rscript '$__tool_directory__/pathview.r'

#if str($pathway.nb) == 'one'
    --pathway_id '$pathway.one_id'
#else
    --pathway_id_fp '$pathway.tabular'
    --pathway_id_header $pathway.header
#end if

    --species '$species'

#if str($gene_data.gd) == 'true'
    --gene_data '$gene_data.tabular'
    --gd_header $gene_data.header
    --gene_idtype '$gene_data.idtype'
#end if

#if str($cpd_data.cpd) == 'true'
    --cpd_data '$cpd_data.tabular'
    --cpd_header $cpd_data.header
    --cpd_idtype '$cpd_data.idtype'
#end if

    --multi_state $multi_state
    --match_data $match_data

    --kegg_native $out.kegg_native.out
    --same_layer $out.kegg_native.same_layer
#if str($out.kegg_native.out) == 'TRUE'
    --map_null $out.kegg_native.map_null
#else
    --split_group $out.kegg_native.split_group
    --expand_node $out.kegg_native.expand_node
    --sign_pos $out.kegg_native.sign_pos
#end if

&& 
ls .
    ]]></command>
    <inputs>
        <conditional name="pathway">
            <param name="nb" type="select" label="Number of pathways to plot">
                <option value="one" selected="true">One</option>
                <option value="multiple">Multiple</option>
            </param>
            <when value="one">
                <param name="one_id" type="text" label="KEGG pathway" help="KEGG pathway ID with 5 digits and without the 3 letter KEGG species code.">
                    <validator type="regex" message="It should be the 5 digits of the KEGG pathway ID.">^(?:\d\d\d\d\d)?$</validator>
                </param>
            </when>
            <when value="multiple">
                <param name="tabular" type="data" format="tabular" optional="true" label="KEGG pathways" help="A table with one-column with KEGG pathways to plot"/>
                <param name="header" type="boolean" checked="true" label="Does the file have header (a first line with column names)?"/>
            </when>
        </conditional>
        <param name="species" type="select" label="Species to use">
            <option value="aga">Anopheles</option>
            <option value="ath">Arabidopsis</option>
            <option value="bta">Bovine</option>
            <option value="cel">Worm</option>
            <option value="cfa">Canine</option>
            <option value="dme">Fly</option>
            <option value="dre">Zebrafish</option>
            <option value="eco">E coli strain K12</option>
            <option value="ecs">E coli strain Sakai</option>
            <option value="gga">Chicken</option>
            <option value="hsa" selected="true">Human</option>
            <option value="mmu">Mouse</option>
            <option value="mcc">Rhesus</option>
            <option value="pfa">Malaria</option>
            <option value="ptr">Chimp</option>
            <option value="rno">Rat</option>
            <option value="sce">Yeast</option>
            <option value="ssc">Pig</option>
            <option value="xla">Xenopus</option>
        </param>
        <conditional name="gene_data">
            <param name="gd" type="select" label="Provide a gene data file?" help="Gene data is a broad concept including genes, transcripts, protein, enzymes and their expression, modifications and any measurable attributes.">
                <option value="true">Yes</option>
                <option value="false">No</option>
            </param>
            <when value="true">
                <param name="tabular" type="data" format="tabular" label="Gene data" help="It should be a table with first column being the gene ids and other being information (p-value, fold change, levels, etc) from one or several samples."/>
                <param name="header" type="boolean" checked="true" label="Does the file have header (a first line with sample names)?"/>
                <param name="idtype" type="select" label="Format for gene data">
                    <option value="entrez" selected="true">Entrez Gene ID</option>
                    <option value="symbol">Gene Symbol</option>
                    <option value="genename">Gene Name</option>
                    <option value="ensembl">Ensembl Gene ID</option>
                    <option value="ensemblprot">Ensembl Protein ID</option>
                    <option value="ensembltrans">Ensembl Trans ID</option>
                    <option value="unigene">UniGene</option>
                    <option value="uniprot">UniProt</option>
                    <option value="refseq">RefSeq</option>
                    <option value="enzyme">Enzyme</option>
                    <option value="tair">TAIR</option>
                    <option value="prosite">Prosite</option>
                    <option value="orf">ORF</option>
                </param>
            </when>
            <when value="false"/>
        </conditional>
        <conditional name="cpd_data">
            <param name="cpd" type="select" label="Provide a compound data file?" help="compound data is a broad concept including metabolites, drugs, their  measurements and attributes.">
                <option value="true">Yes</option>
                <option value="false" selected="true">No</option>
            </param>
            <when value="true">
                <param name="tabular" type="data" format="tabular" label="Compound data" help="It should be a table with first column being the gene ids and other being information (p-value, fold change, levels, etc) from one or several samples."/>
                <param name="header" type="boolean" checked="true" label="Does the file have header (a first line with sample names)?"/>
                <param name="idtype" type="select" label="Format for compound data">
                    <option value="kegg" selected="true">KEGG (include compound, glycan and drug accessions)</option>
                </param>
            </when>
            <when value="false"/>
        </conditional>
        <param name="multi_state" type="boolean" checked="true" label="Should multiple states of gene or compond data be intergrated and plotted in the same graph?" help="Gene or compound nodes will be sliced into multiple pieces corresponding to the number of states in the data" />
        <param name="match_data" type="boolean" checked="true" label="Are the gene and compound data paired?" help="When let sample sizes of gene and compound be m and n, when m>n, extra columns of NA's (mapped to no color) will be added to make the sample size the same" />
        <section name="out" title="Output Options" expanded="true">
            <conditional name="kegg_native">
                <param name="out" type="select" label="Output for pathway">
                    <option value="TRUE" selected="true">KEGG native</option>
                    <option value="FALSE">Graphviz</option>
                </param>
                <when value="TRUE">
                    <param name="same_layer" type="boolean" checked="true" label="Plot on same layer?" help="If yes, node colors will be plotted in the same layer as the pathway graph." />
                    <param name="map_null" type="boolean" checked="true" label="Map the NULL gene or compound data to pathway?" help="When NULL data are mapped, the gene or compound nodes in the pathway will berendered as actually mapped nodes, except with NA-valued color. When NULL data are not mapped, the nodes are rendered as unmapped nodes." />
                </when>
                <when value="FALSE">
                    <param name="same_layer" type="boolean" checked="true" label="Plot on same layer?" help="If yes, edge/node type legend will be plotted in the same page." />
                    <param name="split_group" type="boolean" checked="false" label="Split node groups into individual nodes?" help="Each split member nodes inherits all edges from the node group. This option also effects most metabolic pathways even without group nodes defined orginally. For these pathways, genes involved in the same reaction are grouped automatically when converting reactions to edges unless split.group is true." />
                    <param name="expand_node" type="boolean" checked="false" label="Expand multiple-gene nodes into single-gene nodes?" help="Each expanded single-gene nodes inherits all edges from the original multiple-gene node. This option is not effective for most metabolic pathways where it conflits with converting reactions to edges." />
                    <param name="sign_pos" type="select" label="Position of pathview signature">
                        <option value="bottomright" selected="true">Bottom right</option>
                        <option value="bottomleft">Bottom left</option>
                        <option value="topleft">Top left</option>
                        <option value="topright">Top right</option>
                    </param>
                </when>
            </conditional>
        </section>
    </inputs>
    <outputs>
        <data name="one_kegg_native" format="png" from_work_dir="*.pathview.png" label="${tool.name} on ${on_string}: KEGG Pathway (${species}${pathway.one_id})">
            <filter>pathway['nb'] == 'one'</filter>
            <filter>out['kegg_native']['out'] == 'TRUE'</filter>
        </data>
        <data name="one_graphviz" format="pdf" from_work_dir="*.pathview.pdf" label="${tool.name} on ${on_string}: KEGG Pathway (${species}${pathway.one_id}), Graphviz format">
            <filter>pathway['nb'] == 'one'</filter>
            <filter>out['kegg_native']['out'] == 'FALSE'</filter>
        </data>
        <collection name="multiple_kegg_native" type="list" label="${tool.name} on ${on_string}: KEGG Pathways">
            <filter>pathway['nb'] == 'multiple'</filter>
            <filter>out['kegg_native']['out'] == 'TRUE'</filter>
            <discover_datasets pattern="(?P&lt;designation&gt;.+)\.pathview\.png" directory="." ext="png"/>
        </collection>
        <collection name="multiple_graphviz" type="list" label="${tool.name} on ${on_string}: KEGG Pathways, Graphviz format">
            <filter>pathway['nb'] == 'multiple'</filter>
            <filter>out['kegg_native']['out'] == 'FALSE'</filter>
            <discover_datasets pattern="(?P&lt;designation&gt;.+)\.pathview\.pdf" directory="." ext="pdf"/>
        </collection>
    </outputs>
    <tests>
        <test>
            <conditional name="pathway">
                <param name="nb" value="one" />
                <param name="one_id" value="00010"/>
            </conditional>
            <param name="species" value="dme"/>
            <conditional name="gene_data">
                <param name="gd" value="true"/>
                <param name="tabular" value="gene_data.tabular"/>
                <param name="header" value="false"/>
                <param name="idtype" value="ensembl"/>
            </conditional>
            <conditional name="cpd_data">
                <param name="cpd" value="false"/>
            </conditional>
            <param name="multi_state" value="true" />
            <param name="match_data" value="true"/>
            <section name="out">
                <conditional name="kegg_native">
                    <param name="out" value="TRUE" />
                    <param name="same_layer" value="true"/>
                    <param name="map_null" value="true"/>
                </conditional>
            </section>
            <output name="one_kegg_native" ftype="png" file="dme00010.png" compare="sim_size" delta="30000"/>
        </test>
        <test>
            <conditional name="pathway">
                <param name="nb" value="one" />
                <param name="one_id" value="00010"/>
            </conditional>
            <param name="species" value="dme"/>
            <conditional name="gene_data">
                <param name="gd" value="true"/>
                <param name="tabular" value="gene_data.tabular"/>
                <param name="header" value="false"/>
                <param name="idtype" value="ensembl"/>
            </conditional>
            <conditional name="cpd_data">
                <param name="cpd" value="false"/>
            </conditional>
            <param name="multi_state" value="true" />
            <param name="match_data" value="true"/>
            <section name="out">
                <conditional name="kegg_native">
                    <param name="out" value="FALSE" />
                    <param name="same_layer" value="true"/>
                    <param name="split_group" value="false" />
                    <param name="expand_node" value="false"/>
                    <param name="sign_pos" value="bottomright"/>
                </conditional>
            </section>
            <output name="one_graphviz" ftype="pdf" file="dme00010.pdf" compare="sim_size"/>
        </test>
        <test>
            <conditional name="pathway">
                <param name="nb" value="multiple" />
                <param name="tabular" value="pathways.tabular"/>
                <param name="header" value="false"/>
            </conditional>
            <param name="species" value="dme"/>
            <conditional name="gene_data">
                <param name="gd" value="true"/>
                <param name="tabular" value="gene_data.tabular"/>
                <param name="header" value="false"/>
                <param name="idtype" value="ensembl"/>
            </conditional>
            <conditional name="cpd_data">
                <param name="cpd" value="false"/>
            </conditional>
            <param name="multi_state" value="true" />
            <param name="match_data" value="true"/>
            <section name="out">
                <conditional name="kegg_native">
                    <param name="out" value="TRUE" />
                    <param name="same_layer" value="true"/>
                    <param name="map_null" value="true"/>
                </conditional>
            </section>
            <output_collection name="multiple_kegg_native" count="2">
                <element name="dme00010" ftype="png" file="dme00010.png" compare="sim_size" delta="30000"/>
                <element name="dme00480" ftype="png" file="dme00480.png" compare="sim_size" delta="40000"/>
            </output_collection>
        </test>
        <test>
            <conditional name="pathway">
                <param name="nb" value="multiple" />
                <param name="tabular" value="pathways.tabular"/>
                <param name="header" value="false"/>
            </conditional>
            <param name="species" value="dme"/>
            <conditional name="gene_data">
                <param name="gd" value="true"/>
                <param name="tabular" value="gene_data.tabular"/>
                <param name="header" value="false"/>
                <param name="idtype" value="ensembl"/>
            </conditional>
            <conditional name="cpd_data">
                <param name="cpd" value="false"/>
                <param name="tabular" value="cpd_data.tabular"/>
                <param name="header" value="false"/>
                <param name="idtype" value="kegg"/>
            </conditional>
            <param name="multi_state" value="true" />
            <param name="match_data" value="true"/>
            <section name="out">
                <conditional name="kegg_native">
                    <param name="out" value="FALSE" />
                    <param name="same_layer" value="false"/>
                    <param name="split_group" value="false" />
                    <param name="expand_node" value="false"/>
                    <param name="sign_pos" value="bottomright"/>
                </conditional>
            </section>
            <output_collection name="multiple_graphviz" count="2">
                <element name="dme00010" ftype="pdf" file="dme00010_cpd.pdf" compare="sim_size"/>
                <element name="dme00480" ftype="pdf" file="dme00480_cpd.pdf" compare="sim_size"/>
            </output_collection>
        </test>
    </tests>
    <help><![CDATA[

.. class:: infomark

**What it does**

Pathview is a stand-alone software package for pathway based data integration and visualization. 

This package can be divided into four functional modules: the Downloader, Parser, Mapper andViewer.  
Mostly importantly, pathview maps and renders user data on relevant pathway graphs. 

Notice that KEGG requires subscription for FTP access since May 2011. 
However,Pathview downloads individual pathway graphs and data files through API or HTTP access, which is freely available (for academic and non-commerical uses).
Pathview uses KEGGgraph (Zhang and Wiemann, 2009) when parsing KEGG xml data files. 


Options map closely to the excellent pathview manual_.

-----

**Inputs**

Pathview provides strong support for data integration. It works with:

- essentially all types of biological data mappable to pathways (gene expression, protein expression, genetic association, metabolite, genomic data, literature, etc)
- over 10 types of gene or protein IDs, and 20 types of compound or metabolite IDs
- pathways for about 4800 species as well as KEGG orthology
- various data attributes and formats, i.e.  continuous/discrete data, matrices/vectors, single/multiple samples etc.

Pathview can be directly used for metagenomic, microbiome or unknown species data when the data are mapped to KEGG ortholog pathways.

*Pathway ids* 

Either just the name or a table with one column with the different pathway ids to represent.


*Gene data*

The first input of Pathview is a gene data table. Here gene data is a broad concept including genes, transcripts, protein, enzymes and their expression, modifications and any measurable attributes.  

It should be a table with first column being the gene ids and other being information (p-value, fold change, levels, etc) from one or several samples.
The first line can include the sample names.

Here gene id is a generic concepts, including multiple types of gene, transcript and protein uniquely mappable to KEGG gene IDs. KEGG ortholog IDs are also treated as gene IDs as to handle metagenomic data.

Example:

    =========== =================
    FBgn0039155	-4.14844993705661
    FBgn0003360	-2.99977727873544
    FBgn0026562	-2.38016404989418
    FBgn0025111	2.69993883050214
    FBgn0029167	-2.10506155636758
    =========== =================

*Compound data*

We also frequently want to look at metabolic pathways too. Besides gene nodes, these pathways also have compound nodes. 
Therefore, we may integrate or visualize both gene data and compound data with metabolic pathways.

Here compound data is a broad concept including metabolites, drugs, their  measurements and attributes.  

The format is similar to the gene data table, except named with IDs mappable to KEGG compound IDs.
Over 20 types of IDs included in CHEMBL database can be used here.

-----

**Outputs**

Pathview generates both native KEGG view and Graphviz view for pathways. 

KEGG view keeps all the meta-data on pathways, spacial and temporal information, tissue/cell types, inputs, outputs and connections. This is important for human reading and interpretation of pathway biology.

.. image:: $PATH_TO_IMAGES/dme00010_native.png
   :width: 60 %

Graphviz view provides better control of node and edge attributes, better view of pathway topology, better understanding of the pathway analysis statistics.

.. image:: $PATH_TO_IMAGES/dme00010_graphviz.png
   :width: 60 %

.. _manual: https://bioconductor.org/packages/release/bioc/vignettes/pathview/inst/doc/pathview.pdf
.. _KEGG: http://www.genome.jp/kegg

    ]]></help>
    <citations>
        <citation type="doi">10.1093/bioinformatics/btt285</citation>
    </citations>
</tool>