Mercurial > repos > pieterlukasse > prims_proteomics
changeset 0:d50f079096ee
Push to main toolshed
author | pieter.lukasse@wur.nl |
---|---|
date | Wed, 08 Jan 2014 11:39:16 +0100 (2014-01-08) |
parents | |
children | bcb001f200e8 |
files | Csv2Apml.jar IsoFix.jar LICENSE MsFilt.jar NOTICE NapQ.jar PRIMS.jar ProgenesisConv.jar Quantifere.jar Quantiline.jar README.rst SedMat_cli.jar csv2apml.xml datatypes_conf.xml isofix.xml msfilt.xml napq.xml prims_proteomics_datatypes.py progenesisconverter.xml quantifere.xml quantiline.xml repository_dependencies.xml sedmat.xml static/images/msfilt_csv_out.png static/images/napq_overview.png static/images/quantifere_cyto_out.png |
diffstat | 26 files changed, 1333 insertions(+), 0 deletions(-) [+] |
line wrap: on
line diff
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/LICENSE Wed Jan 08 11:39:16 2014 +0100 @@ -0,0 +1,202 @@ + + Apache License + Version 2.0, January 2004 + http://www.apache.org/licenses/ + + TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION + + 1. Definitions. + + "License" shall mean the terms and conditions for use, reproduction, + and distribution as defined by Sections 1 through 9 of this document. + + "Licensor" shall mean the copyright owner or entity authorized by + the copyright owner that is granting the License. + + "Legal Entity" shall mean the union of the acting entity and all + other entities that control, are controlled by, or are under common + control with that entity. For the purposes of this definition, + "control" means (i) the power, direct or indirect, to cause the + direction or management of such entity, whether by contract or + otherwise, or (ii) ownership of fifty percent (50%) or more of the + outstanding shares, or (iii) beneficial ownership of such entity. + + "You" (or "Your") shall mean an individual or Legal Entity + exercising permissions granted by this License. + + "Source" form shall mean the preferred form for making modifications, + including but not limited to software source code, documentation + source, and configuration files. + + "Object" form shall mean any form resulting from mechanical + transformation or translation of a Source form, including but + not limited to compiled object code, generated documentation, + and conversions to other media types. + + "Work" shall mean the work of authorship, whether in Source or + Object form, made available under the License, as indicated by a + copyright notice that is included in or attached to the work + (an example is provided in the Appendix below). + + "Derivative Works" shall mean any work, whether in Source or Object + form, that is based on (or derived from) the Work and for which the + editorial revisions, annotations, elaborations, or other modifications + represent, as a whole, an original work of authorship. For the purposes + of this License, Derivative Works shall not include works that remain + separable from, or merely link (or bind by name) to the interfaces of, + the Work and Derivative Works thereof. + + "Contribution" shall mean any work of authorship, including + the original version of the Work and any modifications or additions + to that Work or Derivative Works thereof, that is intentionally + submitted to Licensor for inclusion in the Work by the copyright owner + or by an individual or Legal Entity authorized to submit on behalf of + the copyright owner. For the purposes of this definition, "submitted" + means any form of electronic, verbal, or written communication sent + to the Licensor or its representatives, including but not limited to + communication on electronic mailing lists, source code control systems, + and issue tracking systems that are managed by, or on behalf of, the + Licensor for the purpose of discussing and improving the Work, but + excluding communication that is conspicuously marked or otherwise + designated in writing by the copyright owner as "Not a Contribution." + + "Contributor" shall mean Licensor and any individual or Legal Entity + on behalf of whom a Contribution has been received by Licensor and + subsequently incorporated within the Work. + + 2. Grant of Copyright License. Subject to the terms and conditions of + this License, each Contributor hereby grants to You a perpetual, + worldwide, non-exclusive, no-charge, royalty-free, irrevocable + copyright license to reproduce, prepare Derivative Works of, + publicly display, publicly perform, sublicense, and distribute the + Work and such Derivative Works in Source or Object form. + + 3. Grant of Patent License. Subject to the terms and conditions of + this License, each Contributor hereby grants to You a perpetual, + worldwide, non-exclusive, no-charge, royalty-free, irrevocable + (except as stated in this section) patent license to make, have made, + use, offer to sell, sell, import, and otherwise transfer the Work, + where such license applies only to those patent claims licensable + by such Contributor that are necessarily infringed by their + Contribution(s) alone or by combination of their Contribution(s) + with the Work to which such Contribution(s) was submitted. If You + institute patent litigation against any entity (including a + cross-claim or counterclaim in a lawsuit) alleging that the Work + or a Contribution incorporated within the Work constitutes direct + or contributory patent infringement, then any patent licenses + granted to You under this License for that Work shall terminate + as of the date such litigation is filed. + + 4. Redistribution. You may reproduce and distribute copies of the + Work or Derivative Works thereof in any medium, with or without + modifications, and in Source or Object form, provided that You + meet the following conditions: + + (a) You must give any other recipients of the Work or + Derivative Works a copy of this License; and + + (b) You must cause any modified files to carry prominent notices + stating that You changed the files; and + + (c) You must retain, in the Source form of any Derivative Works + that You distribute, all copyright, patent, trademark, and + attribution notices from the Source form of the Work, + excluding those notices that do not pertain to any part of + the Derivative Works; and + + (d) If the Work includes a "NOTICE" text file as part of its + distribution, then any Derivative Works that You distribute must + include a readable copy of the attribution notices contained + within such NOTICE file, excluding those notices that do not + pertain to any part of the Derivative Works, in at least one + of the following places: within a NOTICE text file distributed + as part of the Derivative Works; within the Source form or + documentation, if provided along with the Derivative Works; or, + within a display generated by the Derivative Works, if and + wherever such third-party notices normally appear. The contents + of the NOTICE file are for informational purposes only and + do not modify the License. You may add Your own attribution + notices within Derivative Works that You distribute, alongside + or as an addendum to the NOTICE text from the Work, provided + that such additional attribution notices cannot be construed + as modifying the License. + + You may add Your own copyright statement to Your modifications and + may provide additional or different license terms and conditions + for use, reproduction, or distribution of Your modifications, or + for any such Derivative Works as a whole, provided Your use, + reproduction, and distribution of the Work otherwise complies with + the conditions stated in this License. + + 5. Submission of Contributions. Unless You explicitly state otherwise, + any Contribution intentionally submitted for inclusion in the Work + by You to the Licensor shall be under the terms and conditions of + this License, without any additional terms or conditions. + Notwithstanding the above, nothing herein shall supersede or modify + the terms of any separate license agreement you may have executed + with Licensor regarding such Contributions. + + 6. Trademarks. This License does not grant permission to use the trade + names, trademarks, service marks, or product names of the Licensor, + except as required for reasonable and customary use in describing the + origin of the Work and reproducing the content of the NOTICE file. + + 7. Disclaimer of Warranty. Unless required by applicable law or + agreed to in writing, Licensor provides the Work (and each + Contributor provides its Contributions) on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or + implied, including, without limitation, any warranties or conditions + of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A + PARTICULAR PURPOSE. You are solely responsible for determining the + appropriateness of using or redistributing the Work and assume any + risks associated with Your exercise of permissions under this License. + + 8. Limitation of Liability. In no event and under no legal theory, + whether in tort (including negligence), contract, or otherwise, + unless required by applicable law (such as deliberate and grossly + negligent acts) or agreed to in writing, shall any Contributor be + liable to You for damages, including any direct, indirect, special, + incidental, or consequential damages of any character arising as a + result of this License or out of the use or inability to use the + Work (including but not limited to damages for loss of goodwill, + work stoppage, computer failure or malfunction, or any and all + other commercial damages or losses), even if such Contributor + has been advised of the possibility of such damages. + + 9. Accepting Warranty or Additional Liability. While redistributing + the Work or Derivative Works thereof, You may choose to offer, + and charge a fee for, acceptance of support, warranty, indemnity, + or other liability obligations and/or rights consistent with this + License. However, in accepting such obligations, You may act only + on Your own behalf and on Your sole responsibility, not on behalf + of any other Contributor, and only if You agree to indemnify, + defend, and hold each Contributor harmless for any liability + incurred by, or claims asserted against, such Contributor by reason + of your accepting any such warranty or additional liability. + + END OF TERMS AND CONDITIONS + + APPENDIX: How to apply the Apache License to your work. + + To apply the Apache License to your work, attach the following + boilerplate notice, with the fields enclosed by brackets "[]" + replaced with your own identifying information. (Don't include + the brackets!) The text should be enclosed in the appropriate + comment syntax for the file format. We also recommend that a + file or class name and description of purpose be included on the + same "printed page" as the copyright notice for easier + identification within third-party archives. + + Copyright [yyyy] [name of copyright owner] + + Licensed under the Apache License, Version 2.0 (the "License"); + you may not use this file except in compliance with the License. + You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License.
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/NOTICE Wed Jan 08 11:39:16 2014 +0100 @@ -0,0 +1,13 @@ +PRIMS proteomics toolset & Galaxy wrappers +========================================== + +Tools and wrappers for the PRIMS proteomics toolset. +Suite of custom tools to enable data processing and +protein inference for labeled and label-free Mass Spectrometry proteomics data. +Can be used in combination with PRIMS MASSCOMB (prims_masscomb package). +Copyright 2010-2013 by Pieter Lukasse, Plant Research International (PRI), +Wageningen, The Netherlands. All rights reserved. See the license text below. + +Galaxy wrappers and installation are available from the Galaxy Tool Shed at: +http://toolshed.g2.bx.psu.edu/view/pieterlukasse/prims_proteomics +
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/README.rst Wed Jan 08 11:39:16 2014 +0100 @@ -0,0 +1,67 @@ +PRIMS-proteomics toolset & Galaxy wrappers +========================================== + +Proteomics module of Plant Research International's Mass Spectrometry (PRIMS) toolsuite. +This toolset consists of custom tools to enable data processing and +protein inference for labeled and label-free Mass Spectrometry proteomics data. + +Can be used in combination with PRIMS-MASSCOMB (prims_masscomb package) and +with PRIMV-visualization (primv_visualization package). + +Copyright 2010-2013 by Pieter Lukasse, Plant Research International (PRI), +Wageningen, The Netherlands. All rights reserved. See the license text below. + +Galaxy wrappers and installation are available from the Galaxy Tool Shed at: +http://toolshed.g2.bx.psu.edu/view/pieterlukasse/prims_proteomics + +History +======= + +============== ====================================================================== +Date Changes +-------------- ---------------------------------------------------------------------- +January 2014 * first release via Tool Shed +November 2013 * multiple tools used internally at PRI +end 2011 * first tool +============== ====================================================================== + +Tool Versioning +=============== + +PRIMS tools will have versions of the form X.Y.Z. Versions +differing only after the second decimal should be completely +compatible with each other. Breaking changes should result in an +increment of the number before and/or after the first decimal. All +tools of version less than 1.0.0 should be considered beta. + + +Bug Reports & other questions +============================= + +For the time being issues can be reported via the contact form at: +http://www.wageningenur.nl/en/Persons/PNJ-Pieter-Lukasse.htm + +Developers, Contributions & Collaborations +========================================== + +If you wish to join forces and collaborate on some of the +tools do not hesitate to contact Pieter Lukasse via the contact form above. + + +License (Apache, Version 2.0) +============================= + +Copyright 2013 Pieter Lukasse, Plant Research International (PRI). + +Licensed under the Apache License, Version 2.0 (the "License"); +you may not use this software except in compliance with the License. +You may obtain a copy of the License at + +http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. + \ No newline at end of file
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/csv2apml.xml Wed Jan 08 11:39:16 2014 +0100 @@ -0,0 +1,127 @@ +<tool name="Csv2Apml" id="csv2apml" version="1.0.2"> + <description>Converts MS/MS data in CSV format to APML format</description> + <!-- + For remote debugging start you listener on port 8000 and use the following as command interpreter: + java -jar -Xdebug -Xrunjdwp:transport=dt_socket,address=D0100564.wurnet.nl:8000 + ////////////////////////// + --> + <command interpreter="java -jar "> + Csv2Apml.jar + -peptideAndProteinMatchListCSV $peptideAndProteinMatchListCSV + -attributesMappingCSV $attributesMappingCSV + -apmlFile $apmlFile + </command> + + <inputs> + + <param name="peptideAndProteinMatchListCSV" type="data" + format="csv" label="MS/MS CSV file" + help="MS/MS CSV file containing peptide identifications and protein matches" /> + + <param name="mz" type="text" optional="false" size="30" + label="Column name for precursor m/z" /> + + <param name="rt" type="text" optional="false" size="30" + label="Column name for precursor rt" /> + + <param name="charge" type="text" optional="false" size="30" + label="Column name for precursor charge (z)" /> + + <param name="pepSequence" type="text" optional="false" size="30" + label="Column name for peptide sequence" /> + + <param name="ppidScore" type="text" optional="false" size="30" + label="Column name for peptide identification score" /> + + <param name="scoringSchemeName" type="text" optional="true" size="30" + label="(Optional) Column name containing scoring scheme name" /> + + <param name="statisticalMeasure" type="text" optional="true" size="30" + label="(Optional) Column name for reported statistical measure values" + help="(e.g. column containing p-values or e-values)" /> + + <param name="ppidTheoreticalMz" type="text" optional="true" size="30" + label="(Optional) Column name for peptide theoretical m/z" /> + + <param name="modifications" type="text" optional="true" size="30" + label="(Optional) Column name for reported modifications" /> + + <param name="proteinAccession" type="text" optional="false" size="30" + label="Column name for protein accession code" /> + + <param name="protSequenceLength" type="text" optional="true" size="30" + label="(Optional) Column name for protein sequence length" /> + + <param name="pepProtStart" type="text" optional="true" size="30" + label="(Optional) Column name for protein match location start" + help="Where peptide sequence starts in protein"/> + + <param name="pepProtEnd" type="text" optional="true" size="30" + label="(Optional) Column name for protein match location end" + help="Where peptide sequence ends in protein"/> + + <param name="sourceName" type="text" optional="true" size="30" + label="(Optional) Column name for sample names" /> + + </inputs> + <configfiles> + <configfile name="attributesMappingCSV">Generic name,name in S1 table CSV +mz,${mz} +rt,${rt} +charge,${charge} +pepSequence,${pepSequence} +ppidScore,${ppidScore} +proteinAccession,${proteinAccession} +#if $ppidTheoreticalMz != "None" +ppidTheoreticalMz,${ppidTheoreticalMz} +#end if +#if $modifications != "None" +modifications,${modifications} +#end if +#if $scoringSchemeName != "None" +scoringSchemeName,${scoringSchemeName} +#end if +#if $statisticalMeasure != "None" +statisticalMeasure,${statisticalMeasure} +#end if +#if $protSequenceLength != "None" +protSequenceLength,${protSequenceLength} +#end if +#if $pepProtStart != "None" +pepProtStart,${pepProtStart} +#end if +#if $pepProtEnd != "None" +pepProtEnd,${pepProtEnd} +#end if +#if $sourceName != "None" +sourceName,${sourceName} +#end if</configfile> + </configfiles> + + <outputs> + <data name="apmlFile" format="apml" label="${tool.name} on ${on_string}: APML" > + </data> + </outputs> + <tests> + </tests> + <help> + +.. class:: infomark + +This tool converts a CSV file containing MS/MS peptide identifications and their respective protein matches +to the APML xml format. +The identifications in APML format can be used for example to annotate unidentified MS features via SEDMAT(*). +This format is also compatible with what is expected by other post-processing tools like Quantifere (for +protein inference). + +(*)SEDMAT can use MS2 identification data +and couple it to this MS1 data, thereby annotating the MS1 feature list with identifications. + +----- + +**Output** + +This tools returns the input data in APML xml format. + + </help> +</tool>
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/datatypes_conf.xml Wed Jan 08 11:39:16 2014 +0100 @@ -0,0 +1,9 @@ +<?xml version="1.0"?> +<datatypes> + <datatype_files> + <datatype_file name="prims_proteomics_datatypes.py"/> + </datatype_files> + <registration display_path="display_applications"> + <datatype extension="apml" type="galaxy.datatypes.prims_proteomics_datatypes:Apml" display_in_upload="true" /> + </registration> +</datatypes> \ No newline at end of file
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/isofix.xml Wed Jan 08 11:39:16 2014 +0100 @@ -0,0 +1,66 @@ +<tool name="IsoFix" id="isofix1" version="0.0.1"> + <description>Identifies in-source decay peptides and corrects protein assignments</description> + <!-- + For remote debugging start you listener on port 8000 and use the following as command interpreter: + java -jar -Xdebug -Xrunjdwp:transport=dt_socket,address=D0100564.wurnet.nl:8000 + ////////////////////////// + --> + <command interpreter="java -jar "> + IsoFix.jar + -identificationsFile $identificationsFile + -outputFile $outputFile + -format apml + -rtTol $rtTol + -logFile $logFile + #if $useOriginalProteinSequences.useOriginalProteinSequencesFile == True + -fastaFile $useOriginalProteinSequences.fastaFile + #end if + </command> + + <inputs> + + <param name="identificationsFile" type="data" format="apml" label="MS/MS identifications file" /> + + <param name="rtTol" type="integer" size="10" value="15" label="Retention time tolerance (seconds) " /> + + <param name="createLogFile" type="boolean" checked="true" label="Generate log file" help="Lists the in-source decay peptides found"/> + + <conditional name="useOriginalProteinSequences"> + <param name="useOriginalProteinSequencesFile" type="boolean" + truevalue="Yes" falsevalue="No" checked="true" + label="Use original protein sequences for detecting peptide source relations" + help="This can reduce redundancy in final set by correctly identifying which peptides derive from bigger peptides that are also identified"/> + <when value="Yes"> + <param name="fastaFile" type="data" format="fasta" label="Protein sequences (fasta file)"/> + </when> + </conditional> + + </inputs> + <outputs> + <data name="outputFile" format="apml" label="${identificationsFile.metadata.base_name} - ${tool.name} on ${on_string}: APML" metadata_source="identificationsFile"></data> + <data name="logFile" format="txt" label="${tool.name} on ${on_string} - LOG file"> + <!-- If the expression is false, the file is not created --> + <filter>( createLogFile == True )</filter> + </data> + </outputs> + <tests> + </tests> + <help> + +.. class:: infomark + +This tool identifies in-source decay peptides and corrects protein assignments. + +----- + +**Output example** + +This tools returns the given input file but then with corrected protein assignments and +in-source decay peptides identified (by a small modification in their sequence string). +E.g. if peptide TYNSIMK is found to be an in-source decay of HETTYNSIMK, then +its sequence is changed to HET}TYNSIMK (so the decayed part + "}" + own sequence). +E.g. decay from both sides: YNSI, HETTYNSIMK = HET}TYNSI{MK + + + </help> +</tool>
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/msfilt.xml Wed Jan 08 11:39:16 2014 +0100 @@ -0,0 +1,229 @@ +<tool name="MsFilt" id="msfilt" version="1.0.2"> + <description>Filters annotations based MS/MS peptide identification and annotation quality measures</description> + <!-- + For remote debugging start you listener on port 8000 and use the following as command interpreter: + java -jar -Xdebug -Xrunjdwp:transport=dt_socket,address=D0100564.wurnet.nl:8000 + ////////////////////////// + --> + <command interpreter="java -jar "> + MsFilt.jar + -apmlFile $apmlFile + -datasetCode $apmlFile.metadata.base_name + -rankingMetadataFile $rankingMetadataFile + -statisticalMeasuresConfigFile $statisticalMeasuresConfigFile + -annotationSourceConfigFile $annotationSourceConfigFile + -outApml $outputApml + -outNewIdsApml $outNewIdsApml + -outFullCSV $outputCSV + -outRankingTable $outRankingTable + -outProteinCoverageCSV $outProteinCoverageCSV + -fpCriteriaExpression "$fpCriteriaExpression" + -filterOutFPAnnotations $filterOutFPAnnotations + -fpCriteriaExpressionForIds "$fpCriteriaExpressionForIds" + -filterOutFPIds $filterOutFPIds + -filterOutUnannotatedAlignments $filterOutUnannotatedAlignments + -addRawRankingInfo $addRawRankingInfo + -addScaledIntensityInfo $addScaledIntensityInfo + -addRawIntensityInfo $addRawIntensityInfo + -outReport $htmlReportFile + -outReportPicturesPath $htmlReportFile.files_path + </command> + + <inputs> + + <param name="apmlFile" type="data" format="apml" optional="true" + label="(Optional) Peptide quantification file (APML)" + help="The APML contents as aligned and annotated feature lists. E.g. produced by + SEDMAT or Quantiline tools." /> + + <repeat name="annotationSourceFiles" title="(Optional) Peptide identification files" help="Full set of MS/MS peptide identification files, including peptides that could not be quantified."> + <param name="identificationsFile" type="data" format="apml,mzidentml,prims.fileset.zip" label="Identifications file (APML or MZIDENTML or MZIDENTML fileSet)" /> + <param name="spectraFile" type="data" format="mzidentml,prims.fileset.zip" optional="true" label="(Optional) Spectra fileSet (mzml file or fileSet)" + help="Select this in case your Identifications file is MZIDENTML or MZIDENTML fileSet" /> + </repeat> + + <!-- + <param name="maxNrRankings" type="integer" size="10" value="0" label="Maximum nr. of items to leave in the final ranking (set=0 for no limit) " /> + --> + <!-- TODO add info somewhere that deltaRt is 'corrected deltaRt' --> + <param name="rankingWeightConfig" type="text" area="true" size="13x70" label="Quality Measures (qm's) and ranking weights configuration" + help="Here you may specify a weight for each of the Quality Measures (QMs). These are used for the final QM score and possibly for ranking (e.g. in case of label-free data + processed by SEDMAT). The format is: QM alias => QM name,weight. " +value="qmDRT => delta rt (standard score),1 +
qmDMA => delta mass annotation (standard score),1 +
qmDMP => delta mass psm (standard score),1 +
qmBSCR => best peptide score (standard score),1 +
qmALCV => alignment coverage (fraction),1 +
qmSTCV => score type coverage (fraction),1 +
qmPACV => peptide's best proteinAnnotCoverage (standard score),1 +
qmPICV => peptide's best proteinIdentifCoverage (standard score),1 +
qmANS => annotation sources (count),1 +
qmCSEV => charge states evidence (count),0.2 +
qmBCSP=> best correlation with source or product peptide (correl),1 +
qmBCCS => best correlation with other charge state (correl),1 +
qmBCOS => best correlation with other sibling peptide (correl),1 +"/> + + <param name="statisticalMeasuresConfig" type="text" area="true" size="6x70" label="Statistical measures configuration" + help="Here you may specify the statistical measures that are found in the ms/ms results (e.g. p or e-values). + The format is: SM alias => SM name,type,mode[min/max]. " +value="smXTD => MS:1001330,XSLASH!Tandem:expect,min +
pvCSVEX => p_value,CSV_EXPORT,min +
smAUTO_LIKELIHOOD => AUTOMOD_LOGLIKELIHOOD,PLGS/Auto-mod,max +
smLIKELIHOOD => LOGLIKELIHOOD,PLGS/Databank-search,max +"/> + + <param name="filterOutUnannotatedAlignments" type="boolean" checked="true" + label="Filter out unannotated alignments" + help="This helps decrease the output file size (features with no annotation are then not reported anymore)"/> + + <param name="filterOutFPAnnotations" type="boolean" checked="true" + label="Filter out False Positive (FP) annotations" /> + + <param name="fpCriteriaExpression" type="text" size="120" label="False Positive (FP) criteria for annotations" + help="Criteria (in standard score measures) for classifying an annotation as False Positive (FP). + You can build logical rules using the QM aliases above, the keywords 'and', 'or' and parenthesis. + Comparisons can be made with '==,<,><=,>='" + value="qmDRT <0 or qmDMA <-0.5 or (qmDMP <-0.5 and qmBSCR<-0.5) or (!isNaN(smXTD) and smXTD >0.01)"/> + + + <param name="filterOutFPIds" type="boolean" checked="true" + label="Filter out False Positive (FP) peptide identifications" /> + + <param name="fpCriteriaExpressionForIds" type="text" size="120" + label="False Positive (FP) criteria for identifications" + help="Criteria (in standard score measures) for classifying a peptide identification as False Positive (FP). + Here you can use a subset of the quality measures (qmDMP, qmBSCR, qmSTCV, qmPICV, qmCSEV) and all statistical measures." + value="(qmDMP <-0.5 and qmBSCR<-0.5) or (!isNaN(smXTD) and smXTD >0.01)"/> + + + <param name="addRawRankingInfo" type="boolean" checked="false" + label="Include the raw scores/values of the ranking attributes in the CSV output" + help="This will result in one extra column per ranking attribute, each column holding the original data for this attribute (before normalization)."/> + + <param name="addScaledIntensityInfo" type="boolean" checked="false" + label="Include computed scaled intensity values in the CSV output" + help="The autoscaled and 'z-score'scaled (aka 'standard-score'scaled) intensity values are then added to the full CSV output file"/> + + <param name="addRawIntensityInfo" type="boolean" checked="false" + label="Include the raw intensity values in the CSV output" + help="The original intensity values (as found in the input file) are then added to the full CSV output file"/> + + + </inputs> + <configfiles> + <configfile name="rankingMetadataFile">${rankingWeightConfig}</configfile> + <configfile name="statisticalMeasuresConfigFile">${statisticalMeasuresConfig}</configfile> + <configfile name="annotationSourceConfigFile">## start comment + ## iterate over the selected files and store their names in the config file + #for $i, $s in enumerate( $annotationSourceFiles ) + ${s.identificationsFile}|${s.spectraFile} + ## also print out the datatype in the next line, based on previously configured datatype + #if isinstance( $s.identificationsFile.datatype, $__app__.datatypes_registry.get_datatype_by_extension('apml').__class__): + apml + #else: + mzid + #end if + #end for + ## end comment</configfile> + </configfiles> + <outputs> + <data name="outputApml" format="apml" label="${apmlFile.metadata.base_name} - ${tool.name} on ${on_string}: quantifications (filtered APML)" metadata_source="apmlFile"> + <!-- If the expression is false, the file is not created --> + <filter>( apmlFile != None )</filter> + </data> + <data name="outNewIdsApml" format="apml" label="${tool.name} on ${on_string}: identifications (filtered APML)" > + <filter>( filterOutFPIds == True )</filter> + </data> + <data name="outputCSV" format="csv" label="${apmlFile.metadata.base_name} - ${tool.name} on ${on_string}: Full CSV" metadata_source="apmlFile"> + <filter>( apmlFile != None )</filter> + </data> + <data name="outRankingTable" format="csv" label="${apmlFile.metadata.base_name} - ${tool.name} on ${on_string}: Ranking table (CSV)" metadata_source="apmlFile"> + <filter>( apmlFile != None )</filter> + </data> + <data name="outProteinCoverageCSV" format="csv" label="${tool.name} on ${on_string}: Protein coverage details (CSV)"> + <!-- If the expression is false, the file is not created --> + <filter>( len(list(enumerate(annotationSourceFiles))) > 0 )</filter> + </data> + <data name="htmlReportFile" format="html" label="${tool.name} on ${on_string} - HTML report"/> + </outputs> + <tests> + </tests> + <help> + +.. class:: infomark + +This tool takes in peptide quantification results (e.g. either by SEDMAT for label-free data or by Quantiline for labeled data) +and calculates a number of quality measures that can help in assessing the correctness of the quantification assignment and of the MS/MS peptide +identification itself. The user can use any combination of quality measures (qm's) and statistical measures (sm's) to filter out +low scoring entries. + +.. class:: infomark + +In the label-free data processed by SEDMAT it is possible that a feature quantification gets assigned to different peptides. This means +we have an ambiguous assignment. In such a case +this tool also does a ranking of the different assignments according to their quality measures so that the best scoring assignment +gets ranked as first. + +----- + +**List of abbreviations** + +QM: Quality Measure + +SM: Statistical Measure (e.g. p-value, e-value from MS/MS identification) + +PSM: "Peptide to Spectrum Match" (aka peptide identification) + +FP: False Positive + +----- + +**Filtering options details** + +The FP criteria will be applied to an annotation even if the corresponding quality measures involved +in the expression can NOT ALL be determined. QMs that cannot be determined, get the value 0 (zero) which is +equal to giving it the average value. + +The output report shows some plots that visualize the filtering done. This can help in fine-tuning the right filtering +criteria. + +----- + +**Output details** + +*APML output* + +This tools returns the given APML alignment file further annotated at the alignment level with the best ranking +peptides of each respective alignment. This APML can be used in subsequent Galaxy tools like the proteomics tools +from NBIC. + +The APML output can also be used for the Protein Inference step (see Quantifere tool). + +*CSV output* + +It also returns a CSV format output with the full quality measures and scoring and ranking details. The user could use +this to manually determine new weights for some of the quality measures by techniques such as +linear regression. In other words, this CSV can then be used to fine-tune the weights in a next run. + +Many of the quality measures (QMs) are normalized to their Standard Score (aka z-score). +`See Standard Score for more details...`__ + +Next to giving insight into how the ranking was established, a more complete version of this CSV file is also +generated for tools that cannot or won't process the APML output format. + +Below an brief overview of the CSV and an illustration of the ranking done in case of ambiguous peptides to feature assignments +(explained above, can happen in case of label-free data processing by SEDMAT). + + +.. image:: $PATH_TO_IMAGES/msfilt_csv_out.png + + + +.. __: javascript:window.open('http://en.wikipedia.org/wiki/Standard_score','popUpWindow','height=700,width=800,left=10,top=10,resizable=yes,scrollbars=yes,toolbar=yes,menubar=no,location=no,directories=no,status=yes') + + + + + </help> +</tool>
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/napq.xml Wed Jan 08 11:39:16 2014 +0100 @@ -0,0 +1,93 @@ +<tool name="NapQ" id="napq" version="0.0.1"> + <description>'no alignment'(alignment-free) peptide quantification</description> + <!-- + For remote debugging start you listener on port 8000 and use the following as command interpreter: + java -jar -Xdebug -Xrunjdwp:transport=dt_socket,address=D0100564.wurnet.nl:8000 + ////////////////////////// + --> + <command interpreter="java -jar "> + NapQ.jar + -identificationsConfigFile $identificationsConfigFile + -namingConventionCodesForSamples $namingConventionCodesForSamples + #if $is2D_LC_MS.fractions == True + -namingConventionCodesForFractions $is2D_LC_MS.namingConventionCodesForFractions + #end if + -outputApml $outputApml + -outputTsv $outputTsv + -outReport $htmlReportFile + -outReportPicturesPath $htmlReportFile.files_path + </command> + + <inputs> + + <repeat name="identificationFileList" title="Peptide identification files" help="Full set of MS/MS peptide identification files, including peptides that could not be quantified."> + <param name="identificationsFile" type="data" format="apml,mzidentml,prims.fileset.zip" label="Identifications file (APML or MZIDENTML or MZIDENTML fileSet)" /> + <param name="spectraFile" type="data" format="mzidentml,prims.fileset.zip" optional="true" label="(Optional) Spectra fileSet (mzml file or fileSet)" + help="Select this in case your Identifications file is MZIDENTML or MZIDENTML fileSet" /> + </repeat> + + <param name="namingConventionCodesForSamples" type="text" size="100" value="" + label="Part of run/file name that identifies the sample" + help="Add the CSV list of codes that occur in the file names + and that stand for a sample code. E.g. '_S1,_S2,_S3,etc.' "/> <!-- could do regular expressions as well but this would be hard for biologists, e.g. _F\d\b --> + + + <conditional name="is2D_LC_MS"> + <param name="fractions" type="boolean" truevalue="Yes" falsevalue="No" checked="false" + label="Data is from 2D LC-MS" + help="Data acquisition was done in multiple fractions."/> + <when value="Yes"> + <param name="namingConventionCodesForFractions" type="text" size="100" value="" + label="Part of run/file name that identifies the 2D LC-MS fraction" + help="Add the CSV list of codes that occur in the file names + and that stand for a fraction code. E.g. '_F1,_F2,_F3,etc.' Use this to avoid + that each (fraction) file is seen as a separate run."/> <!-- could do regular expressions as well but this would be hard for biologists, e.g. _F\d\b --> + </when> + </conditional> + + </inputs> + <configfiles> + <configfile name="identificationsConfigFile">## start comment + ## iterate over the selected files and store their names in the config file + #for $i, $s in enumerate( $identificationFileList ) + ${s.identificationsFile}|${s.spectraFile} + ## also print out the datatype in the next line, based on previously configured datatype + #if isinstance( $s.identificationsFile.datatype, $__app__.datatypes_registry.get_datatype_by_extension('apml').__class__): + apml + #else: + mzid + #end if + #end for + ## end comment</configfile> + </configfiles> + <outputs> + <data name="outputApml" format="apml" label="${tool.name} on ${on_string}: peptide quantifications (APML)"/> + <data name="outputTsv" format="tabular" label="${tool.name} on ${on_string}: peptide quantifications (TSV)"/> + <!-- in tsv we can have cols like: pep, avg_m/z, avg rt, m/z window, rt window, i_s1, i_s2, ...--> + <data name="htmlReportFile" format="html" label="${tool.name} on ${on_string} - HTML report"/> + <!-- here we show the samples extracted and the files used to 'build up' each sample --> + </outputs> + <tests> + </tests> + <help> + +.. class:: infomark + +This tool takes in multiple peptide identification result files that have peptide identifications +coupled to some quantification (e.g. precursor intensity information or for example data coming +from MS^E acquisition where peptide identification and quantification are done in the same run and reported together). +Then, based on the given experiment design parameters (i.e. how the result files related back to +replicate runs and samples), it produces a new file in which the peptides are reported with +their calculated quantifications at the sample level. + +The figure below explains this: + +.. image:: $PATH_TO_IMAGES/napq_overview.png + + + + + + + </help> +</tool>
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/prims_proteomics_datatypes.py Wed Jan 08 11:39:16 2014 +0100 @@ -0,0 +1,42 @@ +""" +PRIMS proteomics classes for types defined in datatypes_conf.xml +""" +import logging +import re +from galaxy.datatypes.data import * +from galaxy.datatypes.xml import * +from galaxy.datatypes.sniff import * +from galaxy.datatypes.binary import * +from galaxy.datatypes.interval import * + +log = logging.getLogger(__name__) + + +class ProteomicsXml(GenericXml): + """ An enhanced XML datatype used to reuse code across several + proteomic/mass-spec datatypes. (this part of the code is taken from protk proteomics datatypes package) """ + + def sniff(self, filename): + """ Determines whether the file is the correct XML type. """ + with open(filename, 'r') as contents: + while True: + line = contents.readline() + if line == None or not line.startswith('<?'): + break + pattern = '^<(\w*:)?%s' % self.root # pattern match <root or <ns:root for any ns string + return line != None and re.match(pattern, line) != None + + def set_peek( self, dataset, is_multi_byte=False ): + """Set the peek and blurb text""" + if not dataset.dataset.purged: + dataset.peek = data.get_file_peek( dataset.file_name, is_multi_byte=is_multi_byte ) + dataset.blurb = self.blurb + else: + dataset.peek = 'file does not exist' + dataset.blurb = 'file purged from disk' + +class Apml( ProteomicsXml ): + """APML data""" + file_ext = "apml" + blurb = 'PRIMS APML proteomics data' + root = "apml" \ No newline at end of file
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/progenesisconverter.xml Wed Jan 08 11:39:16 2014 +0100 @@ -0,0 +1,68 @@ +<tool name="ProgenesisConverter" id="progenesisconv1" version="1.0.2"> + <description>Converts Progenesis aligned feature lists in CSV format to APML</description> + <!-- + For remote debugging start you listener on port 8000 and use the following as command interpreter: + java -jar -Xdebug -Xrunjdwp:transport=dt_socket,address=D0100564.wurnet.nl:8000 + ////////////////////////// + --> + <command interpreter="java -jar "> + ProgenesisConv.jar + -progenesisFile $progenesisFile + -apmlFile $apmlFile + #if $multipleScoringSchemes.containsMultipleScoringSchemes == True + -scoringSchemeNameColumn $multipleScoringSchemes.scoringSchemeNameColumn + #end if + #if $statisticalMeasure.containsStatisticalMeasure == True + -statisticalMeasureColumn $statisticalMeasure.statisticalMeasureColumn + #end if + </command> + + <inputs> + + <param name="progenesisFile" type="data" format="csv" label="Progenesis aligned feature lists CSV file" /> + + <conditional name="multipleScoringSchemes"> + <param name="containsMultipleScoringSchemes" type="boolean" truevalue="Yes" falsevalue="No" checked="false" + label="Progenesis scores contain multiple scoring schemes" + help="Set this if the scores in the 'Score' column come from two or more different schemes (e.g. MSE and DDA)"/> + <when value="Yes"> + <param name="scoringSchemeNameColumn" type="text" optional="true" size="30" + label="Column name" + help="Name of the column containing the scoring scheme name" /> + </when> + </conditional> + + <conditional name="statisticalMeasure"> + <param name="containsStatisticalMeasure" type="boolean" truevalue="Yes" falsevalue="No" checked="false" + label="Input sheet contains a statistical measure column" + help="Set this if the the input sheet also contains a column with a statistical measure (e.g. p-value, e-value, etc)"/> + <when value="Yes"> + <param name="statisticalMeasureColumn" type="text" optional="true" size="30" + label="Column name" + help="Name of the column containing the statistical measure" /> + </when> + </conditional> + + </inputs> + <outputs> + <data name="apmlFile" format="apml" label="${progenesisFile.metadata.base_name} - ${tool.name} on ${on_string}: APML" metadata_source="progenesisFile"> + </data> + </outputs> + <tests> + </tests> + <help> + +.. class:: infomark + +This tool converts a Progenesis CSV file to the APML xml format. +This format can be used to submit the data for annotation by SEDMAT. SEDMAT can use MS2 identification data +and couple it to this MS1 data, thereby annotating the MS1 feature list with identifications. + +----- + +**Output example** + +This tools returns APML output that can be used as input for the SEDMAT tool. + + </help> +</tool>
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/quantifere.xml Wed Jan 08 11:39:16 2014 +0100 @@ -0,0 +1,206 @@ +<tool name="Quantifere" id="quantifere1" version="1.0.2"> + <description>Protein Inference by Peptide Quantification patterns</description> + <!-- + For remote debugging start you listener on port 8000 and use the following as command interpreter: + java -jar -Xdebug -Xrunjdwp:transport=dt_socket,address=D0100564.wurnet.nl:8000 + ////////////////////////// + --> + <command interpreter="java -jar "> + Quantifere.jar + -annotatedQuantificationFilesList $annotatedQuantificationFilesList + -identificationFilesList $identificationFilesList + -statisticalMeasuresConfigFile $statisticalMeasuresConfigFile + -quantificationDataToUse $quantificationDataToUse + -minCorrel $minCorrel + -minProtCoverage $minProtCoverage + -minAboveAverageHits $minAboveAverageHits + -minNrIdsForInferencePeptide $minNrIdsForInferencePeptide + -refineModel $refineModel + -functionalAnnotationCSV $functionalAnnotationCSV + -outputCSV $outputCSV + -outputInferenceLogCSV $outputInferenceLogCSV + -outputSummaryAnnotationCSV $outputSummaryAnnotationCSV + -outReport $htmlReportFile + -outReportPicturesPath $htmlReportFile.files_path + #if $is2D_LC_MS.fractions == True + -namingConventionCodesForFractions $is2D_LC_MS.namingConventionCodesForFractions + #end if + </command> + + <inputs> + + <repeat name="annotatedQuantificationFiles" title="Peptide (filtered) quantification files (APML)" + help="The APML contents as aligned, annotated and scored feature lists, + as produced by MsFilt tool. Select one or more files. For 2D-LC-MS we expect one file per fraction."> + <param name="annotatedQuantificationFile" size="50" type="data" format="apml" label="File (APML format)" /> + </repeat> + + <repeat name="identificationFiles" title="Peptide (filtered) identification files (MS/MS identifications)" + help="Full set of MS/MS peptide identification files, including peptides that could not be quantified. + This set of identifications is ideally filtered on some quality and + statistical measures (e.g. as is done by MsFilt). Tip: to base the inference only on the + selected peptide quantification files, you + can select the same quantification files here as well. Select one or more files."> + <param name="identificationFile" size="50" type="data" format="apml,mzid" label="File (APML or MZIDENTML format)" /> + </repeat> + + <conditional name="is2D_LC_MS"> + <param name="fractions" type="boolean" truevalue="Yes" falsevalue="No" checked="false" + label="Data is from 2D LC-MS" + help="Data acquisition was done in multiple fractions."/> + <when value="Yes"> + <param name="namingConventionCodesForFractions" type="text" size="100" value="" + label="Part of run/file name that identifies the 2D LC-MS fraction" + help="Add the CSV list of codes that occur in the file names + and that stand for a fraction code. E.g. '_F1,_F2,_F3,etc.' In this + way different peptide identifications from the same sample but measured + in different fractions can be merged together. Otherwise each (fraction) file + is seen as a separate sample."/> <!-- could do regular expressions as well but this would be hard for biologists, e.g. _F\d\b --> + </when> + </conditional> + + <param name="statisticalMeasuresConfig" type="text" area="true" size="6x70" label="Statistical measures configuration" + help="Here you may specify the statistical measures that are found in the ms/ms results (e.g. p or e-values). + The format is: SM alias => SM name,type,mode[min/max]. Leaving this configuration out while these are present in the + dataset will have the effect that they will be wrongly used as a regular scoring scheme, having effect on for example + the filter criteria below like 'Minimum number of peptide matches with a score above average' ." +value="smXTD => MS:1001330,XSLASH!Tandem:expect,min +
pvCSVEX => p_value,CSV_EXPORT,min +
smAUTO_LIKELIHOOD => AUTOMOD_LOGLIKELIHOOD,PLGS/Auto-mod,max +
smLIKELIHOOD => LOGLIKELIHOOD,PLGS/Databank-search,max +"/> +<!-- keep value attribute above aligned like this to avoid white spaces in the value --> + <param name="quantificationDataToUse" type="select" + label="Quantification data to use" + help="Quantification data to use for the pattern clustering and inference steps. NB: check if the chosen data is also + present in your file, or choose 'auto' to let Quantifere check which quantification type is present in most peptides."> + <option value="auto" selected="true">auto</option> + <option value="getIntensity">(TODO)raw intensities</option> + <option value="getApexIntensity">(TODO)apex intensities</option> + <option value="getNormalizedIntensity">(TODO)normalized intensities</option> + </param> + <!-- TODO let minCorrel default value vary according to quantification type chosen above --> + <param name="minCorrel" type="float" size="10" value="0.85" label="Minimum correlation in a cluster" help="Features will be grouped by their protein annotation and + sample intensity values correlation. Set here the minimum correlation expected between grouped members. This is used to guide the clustering algorithm."/> + + <!-- simple extra heuristics to remove some "noise" protein hits --> + <param name="minProtCoverage" type="float" size="10" value="5.0" label="Minimum protein coverage (%)" help="This will remove proteins that have a too small + portion of their sequence covered by peptide matches."/> + + <param name="minAboveAverageHits" type="integer" size="10" value="1" label="Minimum number of different peptide matches with a score above average" + help="This will remove proteins that do not have enough reasonable peptides hits."/> + + <param name="minNrIdsForInferencePeptide" type="integer" size="10" value="1" label="Minimum number of peptide identifications for inference peptides" + help="Minimum number of peptide identifications a peptide needs to be used as inference peptide for secondary proteins."/> + + + <param name="functionalAnnotationCSV" type="data" format="csv,txt,tsv" optional="true" + label="(Functional)annotation mapping file (csv or tsv format)" + help="Optional file that maps protein accessions to a network, pathway or other higher level annotations. In this file a header line is expected with these 2 columns (names and lower case is important): accession,annotation"/> + + <param name="refineModel" type="boolean" checked="true" label="Refine matches model" + help="This will let the algorithm search for a reduced set of secondary protein matches that still explains the variation in the peptide quantification patterns"/> + + + <param name="summaryReport" type="boolean" checked="true" label="Generate summary report"/> + + </inputs> + <configfiles> + <configfile name="annotatedQuantificationFilesList">## start comment + ## iterate over the selected files and store their names in the config file + #for $i, $s in enumerate( $annotatedQuantificationFiles ) + ${s.annotatedQuantificationFile} + #end for + ## end comment</configfile> + + <configfile name="identificationFilesList">## start comment + ## iterate over the selected files and store their names in the config file + #for $i, $s in enumerate( $identificationFiles ) + ${s.identificationFile} + ## also print out the datatype in the next line, based on previously configured datatype + #if isinstance( $s.identificationFile.datatype, $__app__.datatypes_registry.get_datatype_by_extension('apml').__class__): + apml + #else: + mzid + #end if + #end for + ## end comment</configfile> + <configfile name="statisticalMeasuresConfigFile">## start comment + ${statisticalMeasuresConfig} + </configfile> + </configfiles> + <outputs> + <data name="outputCSV" format="csv" label="${tool.name} on ${on_string}: Proteins list (CSV)" /> + <data name="outputInferenceLogCSV" format="csv" label="${tool.name} on ${on_string}: Inference log (CSV)"/> + <data name="htmlReportFile" format="html" label="${tool.name} on ${on_string} - HTML report"> + <!-- If the expression is false, the file is not created --> + <filter>( summaryReport == True )</filter> + </data> + <data name="outputSummaryAnnotationCSV" format="csv" label="${tool.name} on ${on_string} - Functional annotation summary (CSV)"> + <!-- If the expression is false, the file is not created --> + <filter>( functionalAnnotationCSV != None )</filter> + </data> + </outputs> + <tests> + </tests> + <help> + +.. class:: infomark + +This tool takes Peptide Quantification patterns and uses this to do Protein Inference of both Primary Protein +identifications as well as Secondary Protein identifications. This last class of protein identifications +can not be done by traditional protein inference methods that look only at peptide identifications and +their quality parameters. + + +----- + +**List of definitions** + +Primary Protein identification: protein identification belonging to the minimum set of proteins needed +to account for the observed peptides. + +Secondary Protein identification: extra protein identifications that do not below to the minimum set +of proteins mentioned above. + +raw intensities : is the intensity value resulting from the integration of the feature peak area + +apex intensities: is the intensity value as on the highest point of the feature peak + +normalized intensities : is the intensity normalized by some means + +----- + +**Minimum correlation in a cluster** + +TODO - add doc. + +----- + +**Output details** + +*Proteins list (CSV)* + +This is the list of primary and secondary proteins and their calculated inference score. Proteins +with exactly the same peptide hits are also grouped together and labeled as primary_group and secondary_group +instead of simply primary and secondary. + + +*Inference log (CSV)* + +This CSV table shows all data, both inferred and ruled out proteins. This can be used by the user to +troubleshoot the inference process and understand why certain proteins might have been ruled out. +The CSV is provided in such a format that the data can easily be explored in a Cytoscape network. + +The figure below shows an example of the data being explored in Cytoscape using also the +`Cytoscape chartplugin`_ to visualize the quantification data when selecting the peptide nodes. + +.. image:: $PATH_TO_IMAGES/quantifere_cyto_out.png + + +.. _Cytoscape chartplugin: http://apps.cytoscape.org/apps/chartplugin + + + + </help> +</tool>
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/quantiline.xml Wed Jan 08 11:39:16 2014 +0100 @@ -0,0 +1,62 @@ +<tool name="Quantiline" id="quantiline1" version="1.0.2"> + <description>Labeled ms/ms data pre-processing for Protein Quantification (and Inference) pipelines</description> + <!-- + For remote debugging start you listener on port 8000 and use the following as command interpreter: + java -jar -Xdebug -Xrunjdwp:transport=dt_socket,address=D0100564.wurnet.nl:8000 + ////////////////////////// + --> + <command interpreter="java -jar "> + Quantiline.jar + -ppidsFileName $ppidsFileName + -spectraDataFile $spectraDataFile + -ppidsInputFormat MZID + -labelMzValues "$labelMzValues" + -labelmTol $labelmTol + -outputFile $outputFile + -outReport $outReport + </command> + <inputs> + + <param name="ppidsFileName" type="data" format="prims.fileset.zip" label="MS/MS peptide identifications fileSet (N mzidentml files)"/> + <param name="spectraDataFile" type="data" format="prims.fileset.zip" label="MS/MS spectra fileSet (N mzml files)"/> + + <param name="labelMzValues" type="text" size="20" label="Label m/z values" + help="e.g. for 4plexed iTRAQ : 114.0,115.0,116.0,117.0"/> + + <param name="labelmTol" type="float" size="10" value="0.5" label="Label detection tolerance (Da)" + help="Tolerance in daltons for label detection."/> + + </inputs> + <outputs> + <data name="outputFile" format="apml" label="${tool.name} on ${on_string}: Peptides quantification (APML)" /> + <data name="outReport" format="html" label="${tool.name} on ${on_string}: Peptides quantification report (HTML)"/> + </outputs> + <tests> + </tests> + <help> + +.. class:: infomark + +This tool can read spectra files (mzML) and their respective identification files (mzIdentML) and based +on the configured label masses produce a file that contains the merged information: +peptides and their quantification based on label fragment intensity values read from the spectrum in which they +were identified. + +In other words, it produces the peptide (relative) quantification file. This file can subsequently be used +by other tools for protein inference and protein quantification (e.g. Quantifere). + + +----- + +**Output details** + +*Peptide quantification file (APML)* + +This is the list of peptides with their (relative) quantification based on the labels and their +intensities found in the label peaks of the corresponding spectrum. + + + + + </help> +</tool>
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/repository_dependencies.xml Wed Jan 08 11:39:16 2014 +0100 @@ -0,0 +1,5 @@ +<?xml version="1.0"?> +<repositories description="Required proteomics dependencies."> + <repository toolshed="http://toolshed.g2.bx.psu.edu" name="proteomics_datatypes" owner="iracooke" changeset_revision="09b89b345de2" /> + <repository toolshed="http://testtoolshed.g2.bx.psu.edu" name="proteomics_datatypes" owner="iracooke" changeset_revision="7101f7e4b00b" /> +</repositories>
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/sedmat.xml Wed Jan 08 11:39:16 2014 +0100 @@ -0,0 +1,144 @@ +<tool name="SedMat" id="sedmat1" version="1.0.2"> + <description>Matches MS and MS/MS results</description> + <!-- + For remote debugging start you listener on port 8000 and use the following as command interpreter: + java -jar -Xdebug -Xrunjdwp:transport=dt_socket,address=D0100564.wurnet.nl:8000 + --> + <command interpreter="java -jar "> + SedMat_cli.jar + -pl $inputMS + -plInputFormat apml + -ppids $fileType.inputFormatType.ppidsFile + -ppidsFileGrouping $fileType.type + -ppidsInputFormat $fileType.inputFormatType.ppidsInputFormat + -ppidsFileDescription $fileType.inputFormatType.ppidsFile.name + #if $fileType.inputFormatType.ppidsInputFormat == "mzid" + -spectraDataFile $fileType.inputFormatType.spectraDataFile + #end if + -out $outputData + -outUnmatchedMS2 $outUnmatchedMS2 + -mtol $mtol + -rttol $rttol + -rtShiftDetectionWindow $rtShiftDetectionWindow + -matchOnSameSourceOnly $matchOnSameSourceOnly + -chargeStatesToGenerate $chargeStatesToGenerate + -outReport $htmlReportFile + -outReportPicturesPath $htmlReportFile.files_path + #if $troubleshoot1.troubleshootPeakLocations == True + -troubleshootPeakLocations YES + -mStart $troubleshoot1.mStart + -mEnd $troubleshoot1.mEnd + -rtStart $troubleshoot1.rtStart + -rtEnd $troubleshoot1.rtEnd + -filterSourceName $troubleshoot1.filterSourceName + #end if + #if $matchOnNamingConvention.match == True + -matchOnNamingConvention YES + -namingConventionCodesForMatching $matchOnNamingConvention.namingConventionCodesForMatching + #end if + + </command> + + <inputs> + + <param name="inputMS" type="data" format="apml" label="MS data (APML format)" /> + <!-- possible option <validator type="metadata" check="base_name" message="Metadata missing, click the pencil icon in the history item and set base_name."/> --> + + <conditional name="fileType"> + <param name="type" type="select" label="Peptide identification file grouping type"> + <option value="single" selected="true">single-File</option> + <option value="fileSet">fileSet</option> + </param> + <when value="single"> + <conditional name="inputFormatType"> + <param name="ppidsInputFormat" type="select" label="MS/MS input format"> + <option value="mzid" selected="true">mzIdentML on mzML</option> + <option value="apml">APML</option> + </param> + <when value="mzid"> + <param name="spectraDataFile" type="data" format="mzml" label="MS/MS spectra file (mzml)"/> + <param name="ppidsFile" type="data" format="mzid" label="MS/MS peptide identifications file (mzidentml)"/> + </when> + <when value="apml"> + <param name="ppidsFile" type="data" format="apml" label="MS/MS peptide identifications file (apml)"> + <!-- TODO - find out how to use + <validator type="expression" message="You already selected this file as the MS data file.">value.id == inputMS,{"inputMS":$inputMS},{}</validator>--> + </param> + </when> + </conditional> + </when> + <when value="fileSet"> + <conditional name="inputFormatType"> + <param name="ppidsInputFormat" type="select" label="inputFormat"> + <option value="mzid" selected="true">mzIdentML on mzML</option> + </param> + <when value="mzid"> + <param name="spectraDataFile" type="data" format="prims.fileset.zip" label="MS/MS spectra fileSet (N mzml files)"/> + <param name="ppidsFile" type="data" format="prims.fileset.zip" label="MS/MS peptide identifications fileSet (N mzidentml files)"/> + </when> + </conditional> + </when> + </conditional> + <param name="mtol" type="integer" size="10" value="50" label="m/z tolerance (ppm) " /> + <param name="rttol" type="integer" size="10" value="150" label="Rention time tolerance (seconds) " /> + <param name="rtShiftDetectionWindow" type="integer" size="10" value="20" label="Rention time shift detection window (seconds) " help="Size of the window to use for average rt shift calculations"/> + + <param name="matchOnSameSourceOnly" type="boolean" checked="false" label="Match peaks from same source only" help="If you want this, you might have to inform how to match the source files"/> + <conditional name="matchOnNamingConvention"> + <param name="match" type="boolean" truevalue="Yes" falsevalue="No" checked="false" label="Match using naming convention" help="Use a list of codes that occur in the file names and that link them together."/> + <when value="Yes"> + <param name="namingConventionCodesForMatching" type="text" size="100" value="" label="List of codes in naming convention" help="Add the CSV list of codes that occur in the file names and that link them together. E.g. '_F1,_F2,_F3,etc.'"/> + </when> + </conditional> + + <param name="chargeStatesToGenerate" type="select" display="checkboxes" multiple="true" label="Generate extra charge states" help="The selected charge states will be generated for each MS2 feature "> + <option value="1" selected="true">1</option> + <option value="2" selected="true">2</option> + <option value="3" selected="true">3</option> + <option value="4" selected="true">4</option> + <option value="5">5</option> + </param> + + <param name="summaryReport" type="boolean" checked="true" label="Generate summary report" help="NB: this will increase the processing time"/> + + <conditional name="troubleshoot1"> + <param name="troubleshootPeakLocations" type="boolean" truevalue="Yes" falsevalue="No" checked="false" label="Troubleshoot ms1/ms2 peak locations" help="Small trial run to check if the MS and MS/MS peak lists in their current states can easily be matched "/> + <when value="Yes"> + <param name="mStart" optional="false" type="integer" size="10" value="100" label="Set m/z start " /> + <param name="mEnd" optional="false" type="integer" size="10" value="1000" label="Set m/z end " /> + <param name="rtStart" optional="false" type="integer" size="10" value="10" label="Set rention time start (minutes) " /> + <param name="rtEnd" optional="false" type="integer" size="10" value="20" label="Set rention time end (minutes) " /> + <param name="filterSourceName" type="text" size="100" value="" label="Restrict matching to a specific subset of the files " help="Part of a file name that occurs in both a ms1 and ms2 file (e.g. 'RibO_1_msE1')"/> + </when> + </conditional> + + </inputs> + <outputs> + <data name="outputData" format="apml" label="${inputMS.metadata.base_name} - ${tool.name} on ${on_string}: APML" metadata_source="inputMS"></data> + <data name="outUnmatchedMS2" format="csv" label="${inputMS.metadata.base_name} - ${tool.name} on ${on_string}: unmatched MS2 features CSV" metadata_source="inputMS"></data> + <data name="htmlReportFile" format="html" label="${tool.name} on ${on_string} - HTML report"> + <!-- If the expression is false, the file is not created --> + <filter>( summaryReport == True )</filter> + </data> + </outputs> + <tests> + <!-- find out how to use --> + <test> + </test> + </tests> + <help> + +.. class:: infomark + +This tool matches MS and MS/MS results. SEDMAT stands for "Single Experiment Data Matching Tool". +It can match peaks found in the MS spectra with the peptides found using the MS/MS spectra. +The result is the list of MS peaks annotated with peptides and proteins. + +----- + +**Output example** + +This tools returns APML output, a Cytoscape network (.xgmml) of the matches and Retention Time plots (.pdf). + + </help> +</tool>