diff mqppep_preproc.xml @ 1:b76c75521d91 draft

planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/mqppep commit 43e7a43b545c24b2dc33d039198551c032aa79be
author galaxyp
date Fri, 28 Oct 2022 18:26:42 +0000
parents 8dfd5d2b5903
children b889e05ce77d
line wrap: on
line diff
--- a/mqppep_preproc.xml	Mon Jul 11 19:22:54 2022 +0000
+++ b/mqppep_preproc.xml	Fri Oct 28 18:26:42 2022 +0000
@@ -288,31 +288,36 @@
         </test>
     </tests>
     <help><![CDATA[
-=========================================================
-Phopsphoproteomic Enrichment Pipeline Preprocessing Steps
-=========================================================
+=============================================================
+**Phopsphoproteomic Enrichment Pipeline Preprocessing Steps**
+=============================================================
 
-**Overview**
+*Overview*
+==========
 
 Prior to statistical analysis, it is necessary to perform
 three steps to transform the MaxQuant output
 for phosphoproteome-enriched samples.
 
-**Workflow position**
+*Workflow position*
+===================
 
-``upstream tool``
-      The input data file for this tool is the ``Phospho (STY)Sites.txt`` file that is produced:
+Upstream tool
+=============
+
+The input dataset for this tool is the ``Phospho (STY)Sites.txt`` file that is produced:
 
-      - by the Galaxy "MaxQuant" (``maxquant``) tool
-      - or by the Galaxy "Maxquant (using mqpar.xml)" (``maxquant_mqpar``) tool
-      - or by the desktop version of MaxQuant.
+   - by the Galaxy "MaxQuant" (``maxquant``) tool
+   - or by the Galaxy "Maxquant (using mqpar.xml)" (``maxquant_mqpar``) tool
+   - or by the desktop version of MaxQuant.
 
-``downstream tool``
-  The "MaxQuant Phosphopeptide ANOVA" tool (``mqppep_anova``) consumes the ``merged/filtered`` output file ``preproc_tab`` that this tool produces.
+Downstream tool
+===============
 
-======================================================================
-Phopsphoproteomic Enrichment Pipeline Localization-Probability Cut-Off
-======================================================================
+The "MaxQuant Phosphopeptide ANOVA" tool (``mqppep_anova``) consumes the "preprocessed" output file ``preproc_tab`` that this tool produces.
+
+*Phopsphoproteomic Enrichment Pipeline Localization-Probability Cut-Off*
+========================================================================
 
 This step applies a "localization-probability cut-off" for phosphopeptides for each phosphopeptide.
 Higher values may reduce the number of peptides in the output.
@@ -336,30 +341,48 @@
 so it is omitted here even though it was included in Larry Cheng's original script.
 
 
-**Input dataset**
+Input dataset
+=============
+
+Phospho (STY)Sites.txt
+   This is the ``MaxQuant Phospho (STY)Sites.txt`` file produced by MaxQuant.
+   If you use the desktop version of MaxQuant, you will find this file in the ``txt`` folder.
 
-``phosphoSites``
-    This is the ``MaxQuant Phospho (STY)Sites.txt`` file produced by MaxQuant.
-    If you use the desktop version of MaxQuant, you will find this file in the ``txt`` folder.
+Input parameters
+================
+
+Localization probability cutoff
+  Minimum localization probability; see above.
 
-**Output datasets**
+Intensity merge-function
+  Specifies how intensities for identical phosphosites should be merged; see above.
+
+Output datasets
+===============
 
 ``ppep_intensities``
   Data table (in tabular format) presenting, for each sample, the mass-spectral intensity of each phopshopeptide having localization probability greater than the cutoff.
+
 ``enrichment.pdf``
   Graph (in PDF format) presenting non-zero proportions of pS, pT, and pY among the phosphosites; note that a phosphopeptide may have multiple phosphosite.
+
 ``locProbCutoff.pdf``
   Graph (in PDF format) contrasting proportion of phosphopeptides above the localization probability cutoff with the proportion below.
+
 ``enrichment.svg``
   Enrichment graph (in downloadable "scalable vector graphics" format) for incorporation into documents.
+
 ``locProbCutoff.svg``
   Localization probability cutoff graph (in downloadable "scalable vector graphics" format) for incorporation into documents.
+
 ``filteredData``
   Data table (in tabular format) comprising rows of the ``phosphSites`` input file that are not flagged as contaminants or reversed sequences.
+
 ``quantData``
   Data table (in tabular format) comprising rows of the ``filteredData`` file whose localization probability exceeds the **Localization Probability Cutoff** parameter.
 
-**Authors**
+Authors
+=======
 
 ``Nicholas A. Graham``
   (`ORCiD 0000-0002-6811-1941 <https://orcid.org/0000-0002-6811-1941>`_) initiated the original script.
@@ -374,74 +397,72 @@
   (University of Minnesota Supercomputing Institute) adapted the script to run in Galaxy.
 
 
-=============================================================
-Phopsphoproteomic Enrichment Pipeline Upstream Kinase Mapping
-=============================================================
+*Phopsphoproteomic Enrichment Pipeline Upstream Kinase Mapping*
+===============================================================
 
 This step searches phosphopeptides against several databases for known or predicted sites.
 
-**Input databases**
+Input databases
+===============
 
 ``networkin``
     This table is the result of filtering the NetworkKIN database [Linding 2007; Horn 2014] for cutoff score > 2.0.  The ENSEMBL data used to generate the file were from Ensembl, `ensembl.org <https://web.archive.org/web/20220308011159/http://useast.ensembl.org/index.html>`_ [Howe 2021].
 
-       *To generate this file:*
-
-       **(1)** Download the "precomputed data for all available kinase predictors against ENSEMBL"
-       (Available at the NetworkKIN predictions link on the downloads page at https://web.archive.org/web/20200208000403/http://networkin.info/download/networkin_human_predictions_3.1.tsv.xz;  N.B.: "Commercial users are requested to contact the authors before using the data on the networkin.info website");
+To generate this file:
 
-       **(2)** Decompress the .tsv.xz with file with "unxz" (from XZ Utils `https://tukaani.org/xz/ <https://tukaani.org/xz/>`_);
+   (1) Download the "precomputed data for all available kinase predictors against ENSEMBL" (available at the NetworkKIN predictions link on the downloads page at https://web.archive.org/web/20200208000403/http://networkin.info/download/networkin_human_predictions_3.1.tsv.xz;  N.B.: "Commercial users are requested to contact the authors before using the data on the networkin.info website");
+   (2) Decompress the .tsv.xz with file with "unxz" (from XZ Utils `https://tukaani.org/xz/ <https://tukaani.org/xz/>`_);
+   (3) Filter out the rows having "network_kin" less than 2.0.
 
-       **(3)** Filter out the rows having "network_kin" less than 2.0.
-
-       The result should be a tab-separated file with the following columns:
+The result should be a tab-separated file with the following columns:
 
-           1. ``#substrate``
-           2. ``position``
-           3. ``id``
-           4. ``networkin_score``
-           5. ``tree``
-           6. ``netphorest_group``
-           7. ``netphorest_score``
-           8. ``string_identifier``
-           9. ``string_score``
-           10. ``substrate_name``
-           11. ``sequence``
-           12. ``string_path``
+   - ``#substrate``
+   - ``position``
+   - ``id``
+   - ``networkin_score``
+   - ``tree``
+   - ``netphorest_group``
+   - ``netphorest_score``
+   - ``string_identifier``
+   - ``string_score``
+   - ``substrate_name``
+   - ``sequence``
+   - ``string_path``
 
 
 ``p_sty_motifs``
-  This database merges motif patterns from [Amanchy 2007] and Phosida [Gnad 2011].
+   This database merges motif patterns from [Amanchy 2007] and Phosida [Gnad 2011].
 
-    The Amanchy data are adapted from `http://hprd.org/serine_motifs <http://hprd.org/serine_motifs>`_ and `http://hprd.org/tyrosine_motifs <http://hprd.org/tyrosine_motifs>`_ (both links cite the reference where each motif was published), and the patterns are translated into Perl regular expression format (`https://perldoc.perl.org/perlre <https://perldoc.perl.org/perlre>`_).
+     The Amanchy data are adapted from `https://web.archive.org/web/*/http://hprd.org/serine_motifs <https://web.archive.org/web/*/http://hprd.org/serine_motifs>`_ and `https://web.archive.org/web/*/http://hprd.org/tyrosine_motifs <https://web.archive.org/web/*/http://hprd.org/tyrosine_motifs>`_ (both links cite the reference where each motif was published), and the patterns are translated into Perl regular expression format (`https://perldoc.perl.org/perlre <https://perldoc.perl.org/perlre>`_).
 
-    The Phosida data are adapted (translated to Perl-formatted regular expressions) from `http://pegasus.biochem.mpg.de/phosida/help/motifs.aspx <http://pegasus.biochem.mpg.de/phosida/help/motifs.aspx>`_ (this link cites the reference where each motif was published).
+   The Phosida data are adapted (translated to Perl-formatted regular expressions) from `http://pegasus.biochem.mpg.de/phosida/help/motifs.aspx <http://pegasus.biochem.mpg.de/phosida/help/motifs.aspx>`_ (this link cites the reference where each motif was published).
 
-      This file has three tab-separated columns (and no header):
+   This file has three tab-separated columns (and no header):
 
-         1. column 1 is an (ignored) identifier
-         2. column 2 is a Perl regular expression
-         3. column 3 is a descriptor.
+      - column 1 is an (ignored) identifier
+      - column 2 is a Perl regular expression
+      - column 3 is a descriptor.
 
-      For two examples:
+   For two examples:
 
-      ``2<TAB>R.R..(pS|pT)<TAB>Akt kinase substrate motif (HPRD)``
+       ``2<TAB>R.R..(pS|pT)<TAB>Akt kinase substrate motif (HPRD)``
 
-      ``10<TAB>R..(pS|pT)V<TAB>CAMK2_Phosida``
+       ``10<TAB>R..(pS|pT)V<TAB>CAMK2_Phosida``
 
 ``psp_kinase_substrate``
-  'Kinase-substrate dataset: experimentally determined substrates, sequences, cognate kinases, and metadata curated from the literature' [Hornbeck 2011].  This tabular-formatted file may be downloaded for non-commercial purposes as 'Kinase_Substrate_Dataset.gz' from `https://www.phosphosite.org/staticDownloads.action <https://www.phosphosite.org/staticDownloads.action>`_.
+   'Kinase-substrate dataset: experimentally determined substrates, sequences, cognate kinases, and metadata curated from the literature' [Hornbeck 2011].  This tabular-formatted file may be downloaded for non-commercial purposes as 'Kinase_Substrate_Dataset.gz' from `https://www.phosphosite.org/staticDownloads.action <https://www.phosphosite.org/staticDownloads.action>`_.
 
-      Data extracted from PhosphoSitePlus(R), created by Cell Signaling Technology Inc. PhosphoSitePlus is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License (`https://creativecommons.org/licenses/by-nc-sa/3.0/ <https://creativecommons.org/licenses/by-nc-sa/3.0/>`_). Attribution must be given in written, oral and digital presentations to PhosphoSitePlus, www.phosphosite.org. Written documents should additionally cite:
+       Data extracted from PhosphoSitePlus(R), created by Cell Signaling Technology Inc. PhosphoSitePlus is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License (`https://creativecommons.org/licenses/by-nc-sa/3.0/ <https://creativecommons.org/licenses/by-nc-sa/3.0/>`_). Attribution must be given in written, oral and digital presentations to PhosphoSitePlus, www.phosphosite.org. Written documents should additionally cite:
 
-          Hornbeck PV, Kornhauser JM, Tkachev S, Zhang B, Skrzypek E, Murray B, Latham V, Sullivan M (2012) PhosphoSitePlus: a comprehensive resource for investigating the structure and function of experimentally determined post-translational modifications in man and mouse. Nucleic Acids Res. 40, D261-D270.; www.phosphosite.org.
+       Hornbeck PV, Kornhauser JM, Tkachev S, Zhang B, Skrzypek E, Murray B, Latham V, Sullivan M (2012) PhosphoSitePlus: a comprehensive resource for investigating the structure and function of experimentally determined post-translational modifications in man and mouse. Nucleic Acids Res. 40, D261-D270.; www.phosphosite.org.
 
 ``psp_regulatory_sites``
-  'Regulatory sites: information curated from the literature about modification sites shown to regulate molecular functions, biological processes, and molecular interactions including protein-protein interactions' [Hornbeck 2011].  This tabular-formatted file may be downloaded for non-commercial purposes as 'Regulatory_sites.gz' from `https://www.phosphosite.org/staticDownloads.action <https://www.phosphosite.org/staticDownloads.action>`_.
+   'Regulatory sites: information curated from the literature about modification sites shown to regulate molecular functions, biological processes, and molecular interactions including protein-protein interactions' [Hornbeck 2011].  This tabular-formatted file may be downloaded for non-commercial purposes as 'Regulatory_sites.gz' from `https://www.phosphosite.org/staticDownloads.action <https://www.phosphosite.org/staticDownloads.action>`_.
 
       Terms of use and citatation are as for the ``psp_kinase_substrate`` file.
 
-**Output datasets**
+Output datasets
+===============
 
 ``ppep_map``
   Data table (in tabular format, consumed by the merge/filter step) presenting, for each phosphopeptide, the kinase mappings,  the mass-spectral intensities for each sample, and the metadata from UniProtKB/SwissProt, phospho-sites, phospho-motifs, and regulatory sites.  Data in the columns marked "``Domain``", "``ON_...``", or "``..._PhosphoSite``" are available subject to the following terms:
@@ -455,7 +476,8 @@
 ``ppep_mapping_sqlite``
   SQLite database (consumed by the merge/filter step).
 
-**Authors**
+Authors
+=======
 
 ``Nicholas A. Graham``
   (`ORCiD 0000-0002-6811-1941 <https://orcid.org/0000-0002-6811-1941>`_) wrote the original script.
@@ -464,18 +486,19 @@
   (`ORCiD 0000-0002-2882-0508 <https://orcid.org/0000-0002-2882-0508>`_) adapted the script to run in Galaxy.
 
 
-======================================================
-Phopsphoproteomic Enrichment Pipeline Merge and Filter
-======================================================
+*Phopsphoproteomic Enrichment Pipeline Merge and Filter*
+========================================================
 
 This step merges mapped metadata into metadata for phosphopeptides, filtering by species.
 
-**Input parameters**
+Input parameters
+================
 
 ``species``
   Limit PhosphoSitesPlus to indicated species. Default: **human**
 
-**Output datasets**
+Output datasets
+===============
 
 ``preproc_tab``
   Phosphopeptides annotated with SwissProt and phosphosite metadata, in tabular format.  This file is designed to be consumed by the downstream ANOVA tool.  Some data in the columns marked "PSP" are available subject to the following terms:
@@ -488,7 +511,8 @@
 ``preproc_sqlite``
   ``ppep_mapping_sqlite`` updated with annotations, in SQLite format.
 
-**Authors**
+Authors
+=======
 
 ``Nicholas A. Graham``
   (`ORCiD 0000-0002-6811-1941 <https://orcid.org/0000-0002-6811-1941>`_) initiated the original script.