view purityA.xml @ 5:0d73912c7cdc draft

"planemo upload for repository https://github.com/computational-metabolomics/mspurity-galaxy commit b1b879e29d5d6c97fdc3636aa6e900ad03695f9e"
author computational-metabolomics
date Fri, 13 Nov 2020 10:00:12 +0000
parents 56cce1a90b73
children 55f1691db108
line wrap: on
line source

<tool id="mspurity_puritya" name="msPurity.purityA" version="@TOOL_VERSION@+galaxy@GALAXY_TOOL_VERSION@">
    <description>
        Assess acquired precursor ion purity of MS/MS spectra
    </description>
    <macros>
        <import>macros.xml</import>
    </macros>
    <expand macro="requirements" />
    <command detect_errors="exit_code"><![CDATA[
        Rscript '$__tool_directory__/purityA.R'
            --out_dir='.'
            
            --mzML_files='
                #for $i in $source
                    $i,
                #end for
                '
            --galaxy_names='
                #for $i in $source
                    $i.name,
                #end for
                '
            
            #if $offsets.offsets == 'user'
                --minOffset=$offsets.minoffset
                --maxOffset=$offsets.maxoffset
            #end if

            --iwNorm=$iw_norm
            --ilim=$ilim

            --cores=\${GALAXY_SLOTS:-4}

            #if $nearest
                --nearest
            #end if

            #if $mostIntense
                --mostIntense
            #end if

            #if $isotopes.isotopes == "exclude_default"
                --exclude_isotopes
            #elif $isotopes.isotopes == "user"
                --exclude_isotopes
                --isotope_matrix='$isotopes.im'
            #end if

            --ppmInterp $ppmInterp

    ]]></command>
    <inputs>
        <param name="source" type="data" multiple="true" format="mzml" label="*.mzML file" >
                    <validator type="empty_field" />
        </param>
        <param argument="--mostIntense" type="boolean" checked="true"
               label="Use most intense peak within isolation window for precursor?"
               help="If yes, this will ignore the recorded precursor within the mzML file and use
               use the most intense peak within the isolation window to calculate the precursor ion purity"/>

        <param argument="--nearest" type="boolean" checked="true" 
               label="Use nearest full scan to determine precursor?"
               help="If TRUE, this will use the neareset full scan to the fragmentation scan to determine what the m/z value
                     is of the precursor"/>

        <param argument="--ppmInterp" type="float" label="Interpolation PPM" min="0" value="7"
               help="Set the ppm tolerance for the precursor ion purity interpolation.
               i.e. the ppm tolerence between the precursor ion found in the neighbouring scans. The closest match
               within the window will be used for the interpolation"/>

        <conditional name="offsets">
            <param name="offsets" type="select" label="offsets" >
                <option value="auto" selected="true">Uses offsets determined in the mzML file</option>
                <option value="user">User supplied offset values</option>
            </param>
            <when value="user">
                <expand macro="offsets" />
            </when>
	        <when value="auto"/>

        </conditional>

        <expand macro="general_params" />

    </inputs>
    <outputs>
        <data name="purityA_output_tsv" format="tsv" label="${tool.name} on ${on_string}: tsv"
              from_work_dir="purityA_output.tsv" />
        <data name="purityA_output_rdata" format="rdata" label="${tool.name} on ${on_string}: RData"
              from_work_dir="purityA_output.RData" />
    </outputs>
    <tests>
       <test>
            <param name="source" value="LCMSMS_2.mzML,LCMSMS_1.mzML,LCMS_2.mzML,LCMS_1.mzML" ftype="mzml" />
            <output name="purityA_output_tsv" value="purityA_output.tsv" />
            <output name="purityA_output_rdata" value="purityA_output.RData" ftype="rdata" compare="sim_size"/>
        </test>
    </tests>

    <help><![CDATA[
=============================================================
Assess precursor ion purity of MS/MS files
=============================================================
-----------
Description
-----------

**General**

Tool based on the msPurity::purityA R class used to calculate the the precursor ion purity of each MS/MS scan for mzML files.

Data input of mzML files either from:

* A data collection of the mzML files containing MS/MS scans
* A path to a folder that has mzML files containing MS/MS scans

The precursor ion purity represents the measure of the contribution of a selected precursor
peak in an isolation window used for fragmentation and can be used as away of assessing the
spectral quality and level of "contamination" of fragmentation spectra.

The calculation involves dividing the intensity of the selected precursor peak by the total
intensity of the isolation window and is performed before and after the MS/MS scan of
interest and interpolated at the recorded time of the MS/MS acquisition.

Additionally, isotopic peaks are annotated and omitted from the calculation,
low abundance peaks are removed that are thought to have minor contribution to the
resulting MS/MS spectra and the isolation efficiency of the mass spectrometer can be
used to normalise the intensities used for the calculation.

The output is an RData file with the purityA S4 class object (referred to as pa for convenience throughout
the manual). The object contains a slot (pa@puritydf) where the details of the purity
assessments for each MS/MS scan. The purityA object can then be used for further processing
including linking the fragmentation spectra to XCMS features, averaging fragmentation,
database creation and spectral matching (from the created database).

There is also the additional output of the a tsv file of the pa@puritydf data frame.

**Example LC-MS/MS processing workflow**

The purityA object can be used for further processing including linking the fragmentation spectra to XCMS features,
averaging fragmentation, database creation and spectral matching (from the created database). See below for an example workflow:

* Purity assessments
    +  (mzML files) -> **purityA** -> (pa)
* XCMS processing
    +  (mzML files) -> xcms.xcmsSet -> xcms.merge -> xcms.group -> xcms.retcor -> xcms.group -> (xset)
* Fragmentation processing
    + (xset, pa) -> frag4feature -> filterFragSpectra -> averageAllFragSpectra -> createDatabase -> spectralMatching -> (sqlite spectral database)

**Isolation efficiency**

When the isolation efficiency of an MS instrument is known the peak intensities within an isolation window can be normalised for the precursor purity calculation. The isolation efficiency can be estimated by measuring a single precursor across a sliding window. See figure 3 from the original msPurity paper (Lawson et al 2017). This has been experimentally measured  for a Thermo Fisher Q-Exactive Mass spectrometer using 0.5 Da windows and can be set within msPurity by using msPurity::iwNormQE.5() as the input to the iwNormFunc argument.

Other options to model the isolation efficiency the  gaussian isolation window msPurity::iwNormGauss(minOff=-0.5, maxOff = 0.5) or a R-Cosine window msPurity::iwNormRCosine(minOff=-0.5, maxOff=0.5). Where the minOff and maxOff can be altered depending on the isolation window size.

A user can also define their own normalisation function. The only requirement of the function is that given a value between the minOff and maxOff a normalisation value between 0-1 is returned.

**Notes regarding instrument specific isolation window offsets used:**

* The isolation widths offsets will be automatically determined from extracting metadata from the mzML file. However, for some vendors though this is not recorded, in these cases the offsets should be given by the user as an argument (offsets).
* In the case of Agilent only the "narrow" isolation is supported. This roughly equates to +/- 0.65 Da (depending on the instrument). If the file is detected as originating from an Agilent instrument the isolation widths will automatically be set as +/- 0.65 Da.


See Bioconductor documentation for more details about the function used, msPurity::purityA().

-----------
Outputs
-----------

* purity_msms: A tsv file of all the precursor ion purity score (and other metrics) of each fragmentation spectra
* purity_msms_rdata: The purityA object saved as an rdata file

The purity_msms tsv file consists of the following columns:

* pid: unique id for MS/MS scan
* fileid: unique id for mzML file
* seqNum: scan number
* precursorIntensity: precursor intensity value as defined in the mzML file
* precursorMZ: precursor m/z value as defined in the mzML file
* precursorRT: precursor RT value as defined in the mzML file
* precursorScanNum: precursor scan number value as defined in mzML file
* id: unique id (redundant)
* filename: mzML filename
* precursorNearest: MS1 scan nearest to the MS/MS scan
* aMz: The m/z value in the "precursorNearest" MS1 scan which most closely matches the precursorMZ value provided from the mzML file
* aPurity: The purity score for aMz
* apkNm: The number of peaks in the isolation window for aMz
* iMz: The m/z value in the precursorNearest MS1 scan that is the most intense within the isolation window.
* iPurity: The purity score for iMz
* ipkNm: The number of peaks in the isolation window for iMz
* inPurity: The interpolated purity score (the purity score is calculated at neighbouring MS1 scans and interpolated at the point of the MS/MS acquisition)
* inpkNm: The interpolated number of peaks in the isolation window


    ]]></help>
<expand macro="citations" />
</tool>