Mercurial > repos > pieterlukasse > prims_proteomics

Binary file Csv2Apml.jar has changed
Binary file IsoFix.jar has changed
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/LICENSE	Wed Jan 08 11:39:16 2014 +0100
@@ -0,0 +1,202 @@
+
+                                 Apache License
+                           Version 2.0, January 2004
+                        http://www.apache.org/licenses/
+
+   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
+
+   1. Definitions.
+
+      "License" shall mean the terms and conditions for use, reproduction,
+      and distribution as defined by Sections 1 through 9 of this document.
+
+      "Licensor" shall mean the copyright owner or entity authorized by
+      the copyright owner that is granting the License.
+
+      "Legal Entity" shall mean the union of the acting entity and all
+      other entities that control, are controlled by, or are under common
+      control with that entity. For the purposes of this definition,
+      "control" means (i) the power, direct or indirect, to cause the
+      direction or management of such entity, whether by contract or
+      otherwise, or (ii) ownership of fifty percent (50%) or more of the
+      outstanding shares, or (iii) beneficial ownership of such entity.
+
+      "You" (or "Your") shall mean an individual or Legal Entity
+      exercising permissions granted by this License.
+
+      "Source" form shall mean the preferred form for making modifications,
+      including but not limited to software source code, documentation
+      source, and configuration files.
+
+      "Object" form shall mean any form resulting from mechanical
+      transformation or translation of a Source form, including but
+      not limited to compiled object code, generated documentation,
+      and conversions to other media types.
+
+      "Work" shall mean the work of authorship, whether in Source or
+      Object form, made available under the License, as indicated by a
+      copyright notice that is included in or attached to the work
+      (an example is provided in the Appendix below).
+
+      "Derivative Works" shall mean any work, whether in Source or Object
+      form, that is based on (or derived from) the Work and for which the
+      editorial revisions, annotations, elaborations, or other modifications
+      represent, as a whole, an original work of authorship. For the purposes
+      of this License, Derivative Works shall not include works that remain
+      separable from, or merely link (or bind by name) to the interfaces of,
+      the Work and Derivative Works thereof.
+
+      "Contribution" shall mean any work of authorship, including
+      the original version of the Work and any modifications or additions
+      to that Work or Derivative Works thereof, that is intentionally
+      submitted to Licensor for inclusion in the Work by the copyright owner
+      or by an individual or Legal Entity authorized to submit on behalf of
+      the copyright owner. For the purposes of this definition, "submitted"
+      means any form of electronic, verbal, or written communication sent
+      to the Licensor or its representatives, including but not limited to
+      communication on electronic mailing lists, source code control systems,
+      and issue tracking systems that are managed by, or on behalf of, the
+      Licensor for the purpose of discussing and improving the Work, but
+      excluding communication that is conspicuously marked or otherwise
+      designated in writing by the copyright owner as "Not a Contribution."
+
+      "Contributor" shall mean Licensor and any individual or Legal Entity
+      on behalf of whom a Contribution has been received by Licensor and
+      subsequently incorporated within the Work.
+
+   2. Grant of Copyright License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      copyright license to reproduce, prepare Derivative Works of,
+      publicly display, publicly perform, sublicense, and distribute the
+      Work and such Derivative Works in Source or Object form.
+
+   3. Grant of Patent License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      (except as stated in this section) patent license to make, have made,
+      use, offer to sell, sell, import, and otherwise transfer the Work,
+      where such license applies only to those patent claims licensable
+      by such Contributor that are necessarily infringed by their
+      Contribution(s) alone or by combination of their Contribution(s)
+      with the Work to which such Contribution(s) was submitted. If You
+      institute patent litigation against any entity (including a
+      cross-claim or counterclaim in a lawsuit) alleging that the Work
+      or a Contribution incorporated within the Work constitutes direct
+      or contributory patent infringement, then any patent licenses
+      granted to You under this License for that Work shall terminate
+      as of the date such litigation is filed.
+
+   4. Redistribution. You may reproduce and distribute copies of the
+      Work or Derivative Works thereof in any medium, with or without
+      modifications, and in Source or Object form, provided that You
+      meet the following conditions:
+
+      (a) You must give any other recipients of the Work or
+          Derivative Works a copy of this License; and
+
+      (b) You must cause any modified files to carry prominent notices
+          stating that You changed the files; and
+
+      (c) You must retain, in the Source form of any Derivative Works
+          that You distribute, all copyright, patent, trademark, and
+          attribution notices from the Source form of the Work,
+          excluding those notices that do not pertain to any part of
+          the Derivative Works; and
+
+      (d) If the Work includes a "NOTICE" text file as part of its
+          distribution, then any Derivative Works that You distribute must
+          include a readable copy of the attribution notices contained
+          within such NOTICE file, excluding those notices that do not
+          pertain to any part of the Derivative Works, in at least one
+          of the following places: within a NOTICE text file distributed
+          as part of the Derivative Works; within the Source form or
+          documentation, if provided along with the Derivative Works; or,
+          within a display generated by the Derivative Works, if and
+          wherever such third-party notices normally appear. The contents
+          of the NOTICE file are for informational purposes only and
+          do not modify the License. You may add Your own attribution
+          notices within Derivative Works that You distribute, alongside
+          or as an addendum to the NOTICE text from the Work, provided
+          that such additional attribution notices cannot be construed
+          as modifying the License.
+
+      You may add Your own copyright statement to Your modifications and
+      may provide additional or different license terms and conditions
+      for use, reproduction, or distribution of Your modifications, or
+      for any such Derivative Works as a whole, provided Your use,
+      reproduction, and distribution of the Work otherwise complies with
+      the conditions stated in this License.
+
+   5. Submission of Contributions. Unless You explicitly state otherwise,
+      any Contribution intentionally submitted for inclusion in the Work
+      by You to the Licensor shall be under the terms and conditions of
+      this License, without any additional terms or conditions.
+      Notwithstanding the above, nothing herein shall supersede or modify
+      the terms of any separate license agreement you may have executed
+      with Licensor regarding such Contributions.
+
+   6. Trademarks. This License does not grant permission to use the trade
+      names, trademarks, service marks, or product names of the Licensor,
+      except as required for reasonable and customary use in describing the
+      origin of the Work and reproducing the content of the NOTICE file.
+
+   7. Disclaimer of Warranty. Unless required by applicable law or
+      agreed to in writing, Licensor provides the Work (and each
+      Contributor provides its Contributions) on an "AS IS" BASIS,
+      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
+      implied, including, without limitation, any warranties or conditions
+      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
+      PARTICULAR PURPOSE. You are solely responsible for determining the
+      appropriateness of using or redistributing the Work and assume any
+      risks associated with Your exercise of permissions under this License.
+
+   8. Limitation of Liability. In no event and under no legal theory,
+      whether in tort (including negligence), contract, or otherwise,
+      unless required by applicable law (such as deliberate and grossly
+      negligent acts) or agreed to in writing, shall any Contributor be
+      liable to You for damages, including any direct, indirect, special,
+      incidental, or consequential damages of any character arising as a
+      result of this License or out of the use or inability to use the
+      Work (including but not limited to damages for loss of goodwill,
+      work stoppage, computer failure or malfunction, or any and all
+      other commercial damages or losses), even if such Contributor
+      has been advised of the possibility of such damages.
+
+   9. Accepting Warranty or Additional Liability. While redistributing
+      the Work or Derivative Works thereof, You may choose to offer,
+      and charge a fee for, acceptance of support, warranty, indemnity,
+      or other liability obligations and/or rights consistent with this
+      License. However, in accepting such obligations, You may act only
+      on Your own behalf and on Your sole responsibility, not on behalf
+      of any other Contributor, and only if You agree to indemnify,
+      defend, and hold each Contributor harmless for any liability
+      incurred by, or claims asserted against, such Contributor by reason
+      of your accepting any such warranty or additional liability.
+
+   END OF TERMS AND CONDITIONS
+
+   APPENDIX: How to apply the Apache License to your work.
+
+      To apply the Apache License to your work, attach the following
+      boilerplate notice, with the fields enclosed by brackets "[]"
+      replaced with your own identifying information. (Don't include
+      the brackets!)  The text should be enclosed in the appropriate
+      comment syntax for the file format. We also recommend that a
+      file or class name and description of purpose be included on the
+      same "printed page" as the copyright notice for easier
+      identification within third-party archives.
+
+   Copyright [yyyy] [name of copyright owner]
+
+   Licensed under the Apache License, Version 2.0 (the "License");
+   you may not use this file except in compliance with the License.
+   You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
Binary file MsFilt.jar has changed
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/NOTICE	Wed Jan 08 11:39:16 2014 +0100
@@ -0,0 +1,13 @@
+PRIMS proteomics toolset & Galaxy wrappers
+==========================================
+
+Tools and wrappers for the PRIMS proteomics toolset.
+Suite of custom tools to enable data processing and
+protein inference for labeled and label-free Mass Spectrometry proteomics data.
+Can be used in combination with PRIMS MASSCOMB (prims_masscomb package).
+Copyright 2010-2013 by Pieter Lukasse, Plant Research International (PRI),
+Wageningen, The Netherlands. All rights reserved. See the license text below.
+
+Galaxy wrappers and installation are available from the Galaxy Tool Shed at:
+http://toolshed.g2.bx.psu.edu/view/pieterlukasse/prims_proteomics
+
Binary file NapQ.jar has changed
Binary file PRIMS.jar has changed
Binary file ProgenesisConv.jar has changed
Binary file Quantifere.jar has changed
Binary file Quantiline.jar has changed
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/README.rst	Wed Jan 08 11:39:16 2014 +0100
@@ -0,0 +1,67 @@
+PRIMS-proteomics toolset & Galaxy wrappers
+==========================================
+
+Proteomics module of Plant Research International's Mass Spectrometry (PRIMS) toolsuite.
+This toolset consists of custom tools to enable data processing and
+protein inference for labeled and label-free Mass Spectrometry proteomics data.
+
+Can be used in combination with PRIMS-MASSCOMB (prims_masscomb package) and
+with PRIMV-visualization (primv_visualization package).
+
+Copyright 2010-2013 by Pieter Lukasse, Plant Research International (PRI),
+Wageningen, The Netherlands. All rights reserved. See the license text below.
+
+Galaxy wrappers and installation are available from the Galaxy Tool Shed at:
+http://toolshed.g2.bx.psu.edu/view/pieterlukasse/prims_proteomics
+
+History
+=======
+
+============== ======================================================================
+Date            Changes
+-------------- ----------------------------------------------------------------------
+January 2014   * first release via Tool Shed
+November 2013  * multiple tools used internally at PRI
+end 2011       * first tool
+============== ======================================================================
+
+Tool Versioning
+===============
+
+PRIMS tools will have versions of the form X.Y.Z. Versions
+differing only after the second decimal should be completely
+compatible with each other. Breaking changes should result in an
+increment of the number before and/or after the first decimal. All
+tools of version less than 1.0.0 should be considered beta.
+
+
+Bug Reports & other questions
+=============================
+
+For the time being issues can be reported via the contact form at:
+http://www.wageningenur.nl/en/Persons/PNJ-Pieter-Lukasse.htm
+
+Developers, Contributions & Collaborations
+==========================================
+
+If you wish to join forces and collaborate on some of the
+tools do not hesitate to contact Pieter Lukasse via the contact form above.
+
+
+License (Apache, Version 2.0)
+=============================
+
+Copyright 2013 Pieter Lukasse, Plant Research International (PRI).
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this software except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+
\ No newline at end of file
Binary file SedMat_cli.jar has changed
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/csv2apml.xml	Wed Jan 08 11:39:16 2014 +0100
@@ -0,0 +1,127 @@
+<tool name="Csv2Apml" id="csv2apml" version="1.0.2">
+	<description>Converts MS/MS data in CSV format to APML format</description>
+	<!--
+	   For remote debugging start you listener on port 8000 and use the following as command interpreter:
+	       java -jar -Xdebug -Xrunjdwp:transport=dt_socket,address=D0100564.wurnet.nl:8000
+	                    //////////////////////////
+	    -->
+	<command interpreter="java -jar ">
+	    Csv2Apml.jar
+	    -peptideAndProteinMatchListCSV $peptideAndProteinMatchListCSV
+	    -attributesMappingCSV $attributesMappingCSV
+		-apmlFile $apmlFile
+	</command>
+
+	<inputs>
+
+   		<param name="peptideAndProteinMatchListCSV" type="data"
+   		format="csv" label="MS/MS CSV file"
+   		help="MS/MS CSV file containing peptide identifications and protein matches" />
+
+		<param name="mz" type="text" optional="false" size="30"
+		       label="Column name for precursor m/z" />
+
+		<param name="rt" type="text" optional="false" size="30"
+		       label="Column name for precursor rt" />
+
+		<param name="charge" type="text" optional="false" size="30"
+		       label="Column name for precursor charge (z)" />
+
+		<param name="pepSequence" type="text" optional="false" size="30"
+		       label="Column name for peptide sequence" />
+
+		<param name="ppidScore" type="text" optional="false" size="30"
+		       label="Column name for peptide identification score" />
+
+		<param name="scoringSchemeName" type="text" optional="true" size="30"
+		       label="(Optional) Column name containing scoring scheme name" />
+
+		<param name="statisticalMeasure" type="text" optional="true" size="30"
+			   label="(Optional) Column name for reported statistical measure values"
+			   help="(e.g. column containing p-values or e-values)" />
+
+		<param name="ppidTheoreticalMz" type="text" optional="true" size="30"
+		       label="(Optional) Column name for peptide theoretical m/z" />
+
+		<param name="modifications" type="text" optional="true" size="30"
+		       label="(Optional) Column name for reported modifications" />
+
+		<param name="proteinAccession" type="text" optional="false" size="30"
+		       label="Column name for protein accession code" />
+
+		<param name="protSequenceLength" type="text" optional="true" size="30"
+		       label="(Optional) Column name for protein sequence length" />
+
+		<param name="pepProtStart" type="text" optional="true" size="30"
+		       label="(Optional) Column name for protein match location start"
+		       help="Where peptide sequence starts in protein"/>
+
+		<param name="pepProtEnd" type="text" optional="true" size="30"
+		       label="(Optional) Column name for protein match location end"
+		       help="Where peptide sequence ends in protein"/>
+
+		<param name="sourceName" type="text" optional="true" size="30"
+		       label="(Optional) Column name for sample names" />
+
+	</inputs>
+	<configfiles>
+		<configfile name="attributesMappingCSV">Generic name,name in S1 table CSV
+mz,${mz}
+rt,${rt}
+charge,${charge}
+pepSequence,${pepSequence}
+ppidScore,${ppidScore}
+proteinAccession,${proteinAccession}
+#if $ppidTheoreticalMz != "None"
+ppidTheoreticalMz,${ppidTheoreticalMz}
+#end if
+#if $modifications != "None"
+modifications,${modifications}
+#end if
+#if $scoringSchemeName != "None"
+scoringSchemeName,${scoringSchemeName}
+#end if
+#if $statisticalMeasure != "None"
+statisticalMeasure,${statisticalMeasure}
+#end if
+#if $protSequenceLength != "None"
+protSequenceLength,${protSequenceLength}
+#end if
+#if $pepProtStart != "None"
+pepProtStart,${pepProtStart}
+#end if
+#if $pepProtEnd != "None"
+pepProtEnd,${pepProtEnd}
+#end if
+#if $sourceName != "None"
+sourceName,${sourceName}
+#end if</configfile>
+	</configfiles>
+
+	<outputs>
+	  <data name="apmlFile" format="apml" label="${tool.name} on ${on_string}: APML" >
+	  </data>
+	</outputs>
+	<tests>
+	</tests>
+  <help>
+
+.. class:: infomark
+
+This tool converts a CSV file containing MS/MS peptide identifications and their respective protein matches
+to the APML xml format.
+The identifications in APML format can be used for example to annotate unidentified MS features via SEDMAT(*).
+This format is also compatible with what is expected by other post-processing tools like Quantifere (for
+protein inference).
+
+(*)SEDMAT can use MS2 identification data
+and couple it to this MS1 data, thereby annotating the MS1 feature list with identifications.
+
+-----
+
+**Output**
+
+This tools returns the input data in APML xml format.
+
+  </help>
+</tool>
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/datatypes_conf.xml	Wed Jan 08 11:39:16 2014 +0100
@@ -0,0 +1,9 @@
+<?xml version="1.0"?>
+<datatypes>
+  <datatype_files>
+	<datatype_file name="prims_proteomics_datatypes.py"/>
+  </datatype_files>
+  <registration display_path="display_applications">
+        <datatype extension="apml" type="galaxy.datatypes.prims_proteomics_datatypes:Apml" display_in_upload="true" />
+  </registration>
+</datatypes>
\ No newline at end of file
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/isofix.xml	Wed Jan 08 11:39:16 2014 +0100
@@ -0,0 +1,66 @@
+<tool name="IsoFix" id="isofix1" version="0.0.1">
+	<description>Identifies in-source decay peptides and corrects protein assignments</description>
+	<!--
+	   For remote debugging start you listener on port 8000 and use the following as command interpreter:
+	       java -jar -Xdebug -Xrunjdwp:transport=dt_socket,address=D0100564.wurnet.nl:8000
+	                    //////////////////////////
+	    -->
+	<command interpreter="java -jar ">
+	    IsoFix.jar
+	    -identificationsFile $identificationsFile
+	    -outputFile $outputFile
+	    -format apml
+	    -rtTol $rtTol
+	    -logFile $logFile
+	    #if $useOriginalProteinSequences.useOriginalProteinSequencesFile == True
+        	-fastaFile $useOriginalProteinSequences.fastaFile
+        #end if
+	</command>
+
+	<inputs>
+
+   		<param name="identificationsFile" type="data" format="apml" label="MS/MS identifications file" />
+
+     	<param name="rtTol" type="integer" size="10" value="15" label="Retention time tolerance (seconds) " />
+
+     	<param name="createLogFile" type="boolean" checked="true" label="Generate log file" help="Lists the in-source decay peptides found"/>
+
+     	<conditional name="useOriginalProteinSequences">
+     		<param name="useOriginalProteinSequencesFile" type="boolean"
+     		truevalue="Yes" falsevalue="No" checked="true"
+     		label="Use original protein sequences for detecting peptide source relations"
+     		help="This can reduce redundancy in final set by correctly identifying which peptides derive from bigger peptides that are also identified"/>
+     		<when value="Yes">
+     			<param name="fastaFile" type="data" format="fasta" label="Protein sequences (fasta file)"/>
+     		</when>
+     	</conditional>
+
+	</inputs>
+	<outputs>
+	  <data name="outputFile" format="apml" label="${identificationsFile.metadata.base_name} - ${tool.name} on ${on_string}: APML" metadata_source="identificationsFile"></data>
+	  <data name="logFile" format="txt" label="${tool.name} on ${on_string} - LOG file">
+	 	<!-- If the expression is false, the file is not created -->
+	  	<filter>( createLogFile == True )</filter>
+	  </data>
+	</outputs>
+	<tests>
+	</tests>
+  <help>
+
+.. class:: infomark
+
+This tool identifies in-source decay peptides and corrects protein assignments.
+
+-----
+
+**Output example**
+
+This tools returns the given input file but then with corrected protein assignments and
+in-source decay peptides identified (by a small modification in their sequence string).
+E.g. if peptide TYNSIMK is found to be an in-source decay of HETTYNSIMK, then
+its sequence is changed to HET}TYNSIMK (so the decayed part + "}" + own sequence).
+E.g. decay from both sides: YNSI, HETTYNSIMK = HET}TYNSI{MK
+
+
+  </help>
+</tool>
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/msfilt.xml	Wed Jan 08 11:39:16 2014 +0100
@@ -0,0 +1,229 @@
+<tool name="MsFilt" id="msfilt" version="1.0.2">
+	<description>Filters annotations based MS/MS peptide identification and annotation quality measures</description>
+	<!--
+	   For remote debugging start you listener on port 8000 and use the following as command interpreter:
+	       java -jar -Xdebug -Xrunjdwp:transport=dt_socket,address=D0100564.wurnet.nl:8000
+	                    //////////////////////////
+	    -->
+	<command interpreter="java -jar ">
+	    MsFilt.jar
+	    -apmlFile $apmlFile
+	    -datasetCode $apmlFile.metadata.base_name
+	    -rankingMetadataFile $rankingMetadataFile
+	    -statisticalMeasuresConfigFile $statisticalMeasuresConfigFile
+	    -annotationSourceConfigFile $annotationSourceConfigFile
+	    -outApml $outputApml
+	    -outNewIdsApml $outNewIdsApml
+	    -outFullCSV $outputCSV
+	    -outRankingTable $outRankingTable
+	    -outProteinCoverageCSV $outProteinCoverageCSV
+	    -fpCriteriaExpression "$fpCriteriaExpression"
+	    -filterOutFPAnnotations $filterOutFPAnnotations
+	    -fpCriteriaExpressionForIds "$fpCriteriaExpressionForIds"
+	    -filterOutFPIds $filterOutFPIds
+	    -filterOutUnannotatedAlignments $filterOutUnannotatedAlignments
+	    -addRawRankingInfo $addRawRankingInfo
+	    -addScaledIntensityInfo $addScaledIntensityInfo
+	    -addRawIntensityInfo $addRawIntensityInfo
+    	-outReport $htmlReportFile
+	    -outReportPicturesPath $htmlReportFile.files_path
+	</command>
+
+	<inputs>
+
+   		<param name="apmlFile" type="data" format="apml" optional="true"
+   		         label="(Optional) Peptide quantification file (APML)"
+   		         help="The APML contents as aligned and annotated feature lists. E.g. produced by
+   		               SEDMAT or Quantiline tools." />
+
+   		<repeat name="annotationSourceFiles" title="(Optional) Peptide identification files" help="Full set of MS/MS peptide identification files, including peptides that could not be quantified.">
+   			<param name="identificationsFile" type="data" format="apml,mzidentml,prims.fileset.zip" label="Identifications file (APML or MZIDENTML or MZIDENTML fileSet)" />
+   			<param name="spectraFile" type="data" format="mzidentml,prims.fileset.zip" optional="true" label="(Optional) Spectra fileSet (mzml file or fileSet)"
+   				   help="Select this in case your Identifications file is MZIDENTML or MZIDENTML fileSet" />
+   		</repeat>
+
+     	<!--
+     	<param name="maxNrRankings" type="integer" size="10" value="0" label="Maximum nr. of items to leave in the final ranking (set=0 for no limit) " />
+     	-->
+     	<!--  TODO add info somewhere that deltaRt is 'corrected deltaRt' -->
+		<param name="rankingWeightConfig" type="text" area="true" size="13x70" label="Quality Measures (qm's) and ranking weights configuration"
+		help="Here you may specify a weight for each of the Quality Measures (QMs). These are used for the final QM score and possibly for ranking (e.g. in case of label-free data
+		processed by SEDMAT). The format is: QM alias => QM name,weight. "
+value="qmDRT =&gt; delta rt (standard score),1
+&#xd;&#xa;qmDMA =&gt; delta mass annotation (standard score),1
+&#xd;&#xa;qmDMP =&gt; delta mass psm (standard score),1
+&#xd;&#xa;qmBSCR =&gt; best peptide score (standard score),1
+&#xd;&#xa;qmALCV =&gt; alignment coverage (fraction),1
+&#xd;&#xa;qmSTCV =&gt; score type coverage (fraction),1
+&#xd;&#xa;qmPACV =&gt; peptide's best proteinAnnotCoverage (standard score),1
+&#xd;&#xa;qmPICV =&gt; peptide's best proteinIdentifCoverage (standard score),1
+&#xd;&#xa;qmANS =&gt; annotation sources (count),1
+&#xd;&#xa;qmCSEV =&gt; charge states evidence (count),0.2
+&#xd;&#xa;qmBCSP=&gt; best correlation with source or product peptide (correl),1
+&#xd;&#xa;qmBCCS =&gt; best correlation with other charge state (correl),1
+&#xd;&#xa;qmBCOS =&gt; best correlation with other sibling peptide (correl),1
+"/>
+
+		<param name="statisticalMeasuresConfig" type="text" area="true" size="6x70" label="Statistical measures configuration"
+		help="Here you may specify the statistical measures that are found in the ms/ms results (e.g. p or e-values).
+		The format is: SM alias => SM name,type,mode[min/max]. "
+value="smXTD =&gt; MS:1001330,XSLASH!Tandem:expect,min
+&#xd;&#xa;pvCSVEX =&gt; p_value,CSV_EXPORT,min
+&#xd;&#xa;smAUTO_LIKELIHOOD =&gt; AUTOMOD_LOGLIKELIHOOD,PLGS/Auto-mod,max
+&#xd;&#xa;smLIKELIHOOD =&gt; LOGLIKELIHOOD,PLGS/Databank-search,max
+"/>
+
+     	<param name="filterOutUnannotatedAlignments" type="boolean" checked="true"
+     	label="Filter out unannotated alignments"
+     	help="This helps decrease the output file size (features with no annotation are then not reported anymore)"/>
+
+		<param name="filterOutFPAnnotations" type="boolean" checked="true"
+     	label="Filter out False Positive (FP) annotations" />
+
+		<param name="fpCriteriaExpression" type="text" size="120" label="False Positive (FP) criteria for annotations"
+		help="Criteria (in standard score measures) for classifying an annotation as False Positive (FP).
+		You can build logical rules using the QM aliases above, the keywords 'and', 'or' and parenthesis.
+		Comparisons can be made with '==,&lt;,&gt;&lt;=,&gt;='"
+		value="qmDRT &lt;0 or qmDMA &lt;-0.5 or (qmDMP &lt;-0.5 and qmBSCR&lt;-0.5) or (!isNaN(smXTD) and smXTD &gt;0.01)"/>
+
+
+     	<param name="filterOutFPIds" type="boolean" checked="true"
+     	label="Filter out False Positive (FP) peptide identifications" />
+
+		<param name="fpCriteriaExpressionForIds" type="text" size="120"
+		label="False Positive (FP) criteria for identifications"
+		help="Criteria (in standard score measures) for classifying a peptide identification as False Positive (FP).
+		Here you can use a subset of the quality measures (qmDMP, qmBSCR, qmSTCV, qmPICV, qmCSEV) and all statistical measures."
+		value="(qmDMP &lt;-0.5 and qmBSCR&lt;-0.5) or (!isNaN(smXTD) and smXTD &gt;0.01)"/>
+
+
+     	<param name="addRawRankingInfo" type="boolean" checked="false"
+     	label="Include the raw scores/values of the ranking attributes in the CSV output"
+     	help="This will result in one extra column per ranking attribute, each column holding the original data for this attribute (before normalization)."/>
+
+     	<param name="addScaledIntensityInfo" type="boolean" checked="false"
+     	label="Include computed scaled intensity values in the CSV output"
+     	help="The autoscaled and 'z-score'scaled (aka 'standard-score'scaled) intensity values are then added to the full CSV output file"/>
+
+     	<param name="addRawIntensityInfo" type="boolean" checked="false"
+     	label="Include the raw intensity values in the CSV output"
+     	help="The original intensity values (as found in the input file) are then added to the full CSV output file"/>
+
+
+	</inputs>
+	<configfiles>
+		<configfile name="rankingMetadataFile">${rankingWeightConfig}</configfile>
+		<configfile name="statisticalMeasuresConfigFile">${statisticalMeasuresConfig}</configfile>
+		<configfile name="annotationSourceConfigFile">## start comment
+		## iterate over the selected files and store their names in the config file
+		#for $i, $s in enumerate( $annotationSourceFiles )
+			${s.identificationsFile}|${s.spectraFile}
+			## also print out the datatype in the next line, based on previously configured datatype
+			#if isinstance( $s.identificationsFile.datatype, $__app__.datatypes_registry.get_datatype_by_extension('apml').__class__):
+				apml
+			#else:
+        		mzid
+      		#end if
+		#end for
+		## end comment</configfile>
+	</configfiles>
+	<outputs>
+	  <data name="outputApml" format="apml" label="${apmlFile.metadata.base_name} - ${tool.name} on ${on_string}: quantifications (filtered APML)" metadata_source="apmlFile">
+	 	<!-- If the expression is false, the file is not created -->
+	  	<filter>( apmlFile != None )</filter>
+	  </data>
+	  <data name="outNewIdsApml" format="apml" label="${tool.name} on ${on_string}: identifications (filtered APML)" >
+	  	<filter>( filterOutFPIds == True )</filter>
+	  </data>
+	  <data name="outputCSV" format="csv" label="${apmlFile.metadata.base_name} - ${tool.name} on ${on_string}: Full CSV" metadata_source="apmlFile">
+	  	<filter>( apmlFile != None )</filter>
+	  </data>
+	  <data name="outRankingTable" format="csv" label="${apmlFile.metadata.base_name} - ${tool.name} on ${on_string}: Ranking table (CSV)" metadata_source="apmlFile">
+	  	<filter>( apmlFile != None )</filter>
+	  </data>
+	  <data name="outProteinCoverageCSV" format="csv" label="${tool.name} on ${on_string}: Protein coverage details (CSV)">
+	  	<!-- If the expression is false, the file is not created -->
+	  	<filter>( len(list(enumerate(annotationSourceFiles))) > 0 )</filter>
+	  </data>
+	  <data name="htmlReportFile" format="html" label="${tool.name} on ${on_string} - HTML report"/>
+	</outputs>
+	<tests>
+	</tests>
+  <help>
+
+.. class:: infomark
+
+This tool takes in peptide quantification results (e.g. either by SEDMAT for label-free data or by Quantiline for labeled data)
+and calculates a number of quality measures that can help in assessing the correctness of the quantification assignment and of the MS/MS peptide
+identification itself. The user can use any combination of quality measures (qm's) and statistical measures (sm's) to filter out
+low scoring entries.
+
+.. class:: infomark
+
+In the label-free data processed by SEDMAT it is possible that a feature quantification gets assigned to different peptides. This means
+we have an ambiguous assignment. In such a case
+this tool also does a ranking of the different assignments according to their quality measures so that the best scoring assignment
+gets ranked as first.
+
+-----
+
+**List of abbreviations**
+
+QM: Quality Measure
+
+SM: Statistical Measure (e.g. p-value, e-value from MS/MS identification)
+
+PSM:  "Peptide to Spectrum Match" (aka peptide identification)
+
+FP: False Positive
+
+-----
+
+**Filtering options details**
+
+The FP criteria will be applied to an annotation even if the corresponding quality measures involved
+in the expression can NOT ALL be determined. QMs that cannot be determined, get the value 0 (zero) which is
+equal to giving it the average value.
+
+The output report shows some plots that visualize the filtering done. This can help in fine-tuning the right filtering
+criteria.
+
+-----
+
+**Output details**
+
+*APML output*
+
+This tools returns the given APML alignment file further annotated at the alignment level with the best ranking
+peptides of each respective alignment. This APML can be used in subsequent Galaxy tools like the proteomics tools
+from NBIC.
+
+The APML output can also be used for the Protein Inference step (see Quantifere tool).
+
+*CSV output*
+
+It also returns a CSV format output with the full quality measures and scoring and ranking details. The user could use
+this to manually determine new weights for some of the quality measures by techniques such as
+linear regression. In other words, this CSV can then be used to fine-tune the weights in a next run.
+
+Many of the quality measures (QMs) are normalized to their Standard Score (aka z-score).
+`See Standard Score for more details...`__
+
+Next to giving insight into how the ranking was established, a more complete version of this CSV file is also
+generated for tools that cannot or won't process the APML output format.
+
+Below an brief overview of the CSV and an illustration of the ranking done in case of ambiguous peptides to feature assignments
+(explained above, can happen in case of label-free data processing by SEDMAT).
+
+
+.. image:: $PATH_TO_IMAGES/msfilt_csv_out.png
+
+
+
+.. __: javascript:window.open('http://en.wikipedia.org/wiki/Standard_score','popUpWindow','height=700,width=800,left=10,top=10,resizable=yes,scrollbars=yes,toolbar=yes,menubar=no,location=no,directories=no,status=yes')
+
+
+
+
+  </help>
+</tool>
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/napq.xml	Wed Jan 08 11:39:16 2014 +0100
@@ -0,0 +1,93 @@
+<tool name="NapQ" id="napq" version="0.0.1">
+	<description>'no alignment'(alignment-free) peptide quantification</description>
+	<!--
+	   For remote debugging start you listener on port 8000 and use the following as command interpreter:
+	       java -jar -Xdebug -Xrunjdwp:transport=dt_socket,address=D0100564.wurnet.nl:8000
+	                    //////////////////////////
+	    -->
+	<command interpreter="java -jar ">
+	    NapQ.jar
+	    -identificationsConfigFile $identificationsConfigFile
+	    -namingConventionCodesForSamples $namingConventionCodesForSamples
+	    #if $is2D_LC_MS.fractions == True
+        	-namingConventionCodesForFractions $is2D_LC_MS.namingConventionCodesForFractions
+        #end if
+	    -outputApml $outputApml
+	    -outputTsv $outputTsv
+	    -outReport $htmlReportFile
+	    -outReportPicturesPath $htmlReportFile.files_path
+	</command>
+
+	<inputs>
+
+   		<repeat name="identificationFileList" title="Peptide identification files" help="Full set of MS/MS peptide identification files, including peptides that could not be quantified.">
+   			<param name="identificationsFile" type="data" format="apml,mzidentml,prims.fileset.zip" label="Identifications file (APML or MZIDENTML or MZIDENTML fileSet)" />
+   			<param name="spectraFile" type="data" format="mzidentml,prims.fileset.zip" optional="true" label="(Optional) Spectra fileSet (mzml file or fileSet)"
+   				   help="Select this in case your Identifications file is MZIDENTML or MZIDENTML fileSet" />
+   		</repeat>
+
+		<param name="namingConventionCodesForSamples" type="text" size="100" value=""
+		label="Part of run/file name that identifies the sample"
+		help="Add the CSV list of codes that occur in the file names
+			and that stand for a sample code. E.g. '_S1,_S2,_S3,etc.' "/> <!-- could do regular expressions as well but this would be hard for biologists, e.g. _F\d\b -->
+
+
+   		<conditional name="is2D_LC_MS">
+     		<param name="fractions" type="boolean" truevalue="Yes" falsevalue="No" checked="false"
+     		label="Data is from 2D LC-MS"
+     		help="Data acquisition was done in multiple fractions."/>
+     		<when value="Yes">
+     			<param name="namingConventionCodesForFractions" type="text" size="100" value=""
+     			label="Part of run/file name that identifies the 2D LC-MS fraction"
+     			help="Add the CSV list of codes that occur in the file names
+     				and that stand for a fraction code. E.g. '_F1,_F2,_F3,etc.' Use this to avoid
+     				that each (fraction) file is seen as a separate run."/> <!-- could do regular expressions as well but this would be hard for biologists, e.g. _F\d\b -->
+     		</when>
+     	</conditional>
+
+	</inputs>
+	<configfiles>
+		<configfile name="identificationsConfigFile">## start comment
+		## iterate over the selected files and store their names in the config file
+		#for $i, $s in enumerate( $identificationFileList )
+			${s.identificationsFile}|${s.spectraFile}
+			## also print out the datatype in the next line, based on previously configured datatype
+			#if isinstance( $s.identificationsFile.datatype, $__app__.datatypes_registry.get_datatype_by_extension('apml').__class__):
+				apml
+			#else:
+        		mzid
+      		#end if
+		#end for
+		## end comment</configfile>
+	</configfiles>
+	<outputs>
+	  <data name="outputApml" format="apml" label="${tool.name} on ${on_string}: peptide quantifications (APML)"/>
+	  <data name="outputTsv" format="tabular" label="${tool.name} on ${on_string}: peptide quantifications (TSV)"/>
+	  <!-- in tsv we can have cols like: pep, avg_m/z, avg rt, m/z window, rt window, i_s1, i_s2, ...-->
+	  <data name="htmlReportFile" format="html" label="${tool.name} on ${on_string} - HTML report"/>
+	  <!-- here we show the samples extracted and the files used to 'build up' each sample -->
+	</outputs>
+	<tests>
+	</tests>
+  <help>
+
+.. class:: infomark
+
+This tool takes in multiple peptide identification result files that have peptide identifications
+coupled to some quantification (e.g. precursor intensity information or for example data coming
+from MS^E acquisition where peptide identification and quantification are done in the same run and reported together).
+Then, based on the given experiment design parameters (i.e. how the result files related back to
+replicate runs and samples), it produces a new file in which the peptides are reported with
+their calculated quantifications at the sample level.
+
+The figure below explains this:
+
+.. image:: $PATH_TO_IMAGES/napq_overview.png
+
+
+
+
+
+
+  </help>
+</tool>
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/prims_proteomics_datatypes.py	Wed Jan 08 11:39:16 2014 +0100
@@ -0,0 +1,42 @@
+"""
+PRIMS proteomics classes for types defined in datatypes_conf.xml
+"""
+import logging
+import re
+from galaxy.datatypes.data import *
+from galaxy.datatypes.xml import *
+from galaxy.datatypes.sniff import *
+from galaxy.datatypes.binary import *
+from galaxy.datatypes.interval import *
+
+log = logging.getLogger(__name__)
+
+
+class ProteomicsXml(GenericXml):
+    """ An enhanced XML datatype used to reuse code across several
+    proteomic/mass-spec datatypes. (this part of the code is taken from protk proteomics datatypes package) """
+
+    def sniff(self, filename):
+        """ Determines whether the file is the correct XML type. """
+        with open(filename, 'r') as contents:
+            while True:
+                line = contents.readline()
+                if line == None or not line.startswith('<?'):
+                    break
+            pattern = '^<(\w*:)?%s' % self.root # pattern match <root or <ns:root for any ns string
+            return line != None and re.match(pattern, line) != None
+
+    def set_peek( self, dataset, is_multi_byte=False ):
+        """Set the peek and blurb text"""
+        if not dataset.dataset.purged:
+            dataset.peek = data.get_file_peek( dataset.file_name, is_multi_byte=is_multi_byte )
+            dataset.blurb = self.blurb
+        else:
+            dataset.peek = 'file does not exist'
+            dataset.blurb = 'file purged from disk'
+
+class Apml( ProteomicsXml ):
+    """APML data"""
+    file_ext = "apml"
+    blurb = 'PRIMS APML proteomics data'
+    root = "apml"
\ No newline at end of file
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/progenesisconverter.xml	Wed Jan 08 11:39:16 2014 +0100
@@ -0,0 +1,68 @@
+<tool name="ProgenesisConverter" id="progenesisconv1" version="1.0.2">
+	<description>Converts Progenesis aligned feature lists in CSV format to APML</description>
+	<!--
+	   For remote debugging start you listener on port 8000 and use the following as command interpreter:
+	       java -jar -Xdebug -Xrunjdwp:transport=dt_socket,address=D0100564.wurnet.nl:8000
+	                    //////////////////////////
+	    -->
+	<command interpreter="java -jar ">
+	    ProgenesisConv.jar
+	    -progenesisFile $progenesisFile
+	    -apmlFile $apmlFile
+	    #if $multipleScoringSchemes.containsMultipleScoringSchemes == True
+	    	-scoringSchemeNameColumn $multipleScoringSchemes.scoringSchemeNameColumn
+	    #end if
+	    #if $statisticalMeasure.containsStatisticalMeasure == True
+	    	-statisticalMeasureColumn $statisticalMeasure.statisticalMeasureColumn
+	    #end if
+	</command>
+
+	<inputs>
+
+   		<param name="progenesisFile" type="data" format="csv" label="Progenesis aligned feature lists CSV file" />
+
+     	<conditional name="multipleScoringSchemes">
+     		<param name="containsMultipleScoringSchemes" type="boolean" truevalue="Yes" falsevalue="No" checked="false"
+     		       label="Progenesis scores contain multiple scoring schemes"
+     		       help="Set this if the scores in the 'Score' column come from two or more different schemes (e.g. MSE and DDA)"/>
+     		<when value="Yes">
+     			<param name="scoringSchemeNameColumn" type="text" optional="true" size="30"
+				       label="Column name"
+				       help="Name of the column containing the scoring scheme name" />
+     		</when>
+     	</conditional>
+
+     	<conditional name="statisticalMeasure">
+     		<param name="containsStatisticalMeasure" type="boolean" truevalue="Yes" falsevalue="No" checked="false"
+     		       label="Input sheet contains a statistical measure column"
+     		       help="Set this if the the input sheet also contains a column with a statistical measure (e.g. p-value, e-value, etc)"/>
+     		<when value="Yes">
+				<param name="statisticalMeasureColumn" type="text" optional="true" size="30"
+				       label="Column name"
+				       help="Name of the column containing the statistical measure" />
+     		</when>
+     	</conditional>
+
+	</inputs>
+	<outputs>
+	  <data name="apmlFile" format="apml" label="${progenesisFile.metadata.base_name} - ${tool.name} on ${on_string}: APML" metadata_source="progenesisFile">
+	  </data>
+	</outputs>
+	<tests>
+	</tests>
+  <help>
+
+.. class:: infomark
+
+This tool converts a Progenesis CSV file to the APML xml format.
+This format can be used to submit the data for annotation by SEDMAT. SEDMAT can use MS2 identification data
+and couple it to this MS1 data, thereby annotating the MS1 feature list with identifications.
+
+-----
+
+**Output example**
+
+This tools returns APML output that can be used as input for the SEDMAT tool.
+
+  </help>
+</tool>
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/quantifere.xml	Wed Jan 08 11:39:16 2014 +0100
@@ -0,0 +1,206 @@
+<tool name="Quantifere" id="quantifere1" version="1.0.2">
+	<description>Protein Inference by Peptide Quantification patterns</description>
+	<!--
+	   For remote debugging start you listener on port 8000 and use the following as command interpreter:
+	       java -jar -Xdebug -Xrunjdwp:transport=dt_socket,address=D0100564.wurnet.nl:8000
+	                    //////////////////////////
+	    -->
+	<command interpreter="java -jar ">
+	    Quantifere.jar
+	    -annotatedQuantificationFilesList $annotatedQuantificationFilesList
+	    -identificationFilesList $identificationFilesList
+    	-statisticalMeasuresConfigFile $statisticalMeasuresConfigFile
+	    -quantificationDataToUse $quantificationDataToUse
+	    -minCorrel $minCorrel
+	    -minProtCoverage $minProtCoverage
+	    -minAboveAverageHits $minAboveAverageHits
+	    -minNrIdsForInferencePeptide $minNrIdsForInferencePeptide
+	    -refineModel $refineModel
+	    -functionalAnnotationCSV $functionalAnnotationCSV
+	    -outputCSV $outputCSV
+	    -outputInferenceLogCSV $outputInferenceLogCSV
+	    -outputSummaryAnnotationCSV $outputSummaryAnnotationCSV
+	    -outReport $htmlReportFile
+	    -outReportPicturesPath $htmlReportFile.files_path
+	    #if $is2D_LC_MS.fractions == True
+        	-namingConventionCodesForFractions $is2D_LC_MS.namingConventionCodesForFractions
+        #end if
+	</command>
+
+	<inputs>
+
+   		<repeat name="annotatedQuantificationFiles" title="Peptide (filtered) quantification files (APML)"
+   		help="The APML contents as aligned, annotated and scored feature lists,
+   		as produced by MsFilt tool. Select one or more files. For 2D-LC-MS we expect one file per fraction.">
+   			<param name="annotatedQuantificationFile" size="50" type="data" format="apml" label="File (APML format)" />
+   		</repeat>
+
+   		<repeat name="identificationFiles" title="Peptide (filtered) identification files (MS/MS identifications)"
+   		help="Full set of MS/MS peptide identification files, including peptides that could not be quantified.
+   		This set of identifications is ideally filtered on some quality and
+   		statistical measures (e.g. as is done by MsFilt). Tip: to base the inference only on the
+   		selected peptide quantification files, you
+   		can select the same quantification files here as well. Select one or more files.">
+   			<param name="identificationFile" size="50" type="data" format="apml,mzid" label="File (APML or MZIDENTML format)" />
+   		</repeat>
+
+   		<conditional name="is2D_LC_MS">
+     		<param name="fractions" type="boolean" truevalue="Yes" falsevalue="No" checked="false"
+     		label="Data is from 2D LC-MS"
+     		help="Data acquisition was done in multiple fractions."/>
+     		<when value="Yes">
+     			<param name="namingConventionCodesForFractions" type="text" size="100" value=""
+     			label="Part of run/file name that identifies the 2D LC-MS fraction"
+     			help="Add the CSV list of codes that occur in the file names
+     				and that stand for a fraction code. E.g. '_F1,_F2,_F3,etc.' In this
+     				way different peptide identifications from the same sample but measured
+     				in different fractions can be merged together. Otherwise each (fraction) file
+     				is seen as a separate sample."/> <!-- could do regular expressions as well but this would be hard for biologists, e.g. _F\d\b -->
+     		</when>
+     	</conditional>
+
+   		<param name="statisticalMeasuresConfig" type="text" area="true" size="6x70" label="Statistical measures configuration"
+				help="Here you may specify the statistical measures that are found in the ms/ms results (e.g. p or e-values).
+				The format is: SM alias => SM name,type,mode[min/max]. Leaving this configuration out while these are present in the
+				dataset will have the effect that they will be wrongly used as a regular scoring scheme, having effect on for example
+				the filter criteria below like 'Minimum number of peptide matches with a score above average' ."
+value="smXTD =&gt; MS:1001330,XSLASH!Tandem:expect,min
+&#xd;&#xa;pvCSVEX =&gt; p_value,CSV_EXPORT,min
+&#xd;&#xa;smAUTO_LIKELIHOOD =&gt; AUTOMOD_LOGLIKELIHOOD,PLGS/Auto-mod,max
+&#xd;&#xa;smLIKELIHOOD =&gt; LOGLIKELIHOOD,PLGS/Databank-search,max
+"/>
+<!-- keep value attribute above aligned like this to avoid white spaces in the value -->
+   		<param name="quantificationDataToUse" type="select"
+   		label="Quantification data to use"
+   		help="Quantification data to use for the pattern clustering and inference steps. NB: check if the chosen data is also
+   		      present in your file, or choose 'auto' to let Quantifere check which quantification type is present in most peptides.">
+	    	<option value="auto" selected="true">auto</option>
+	    	<option value="getIntensity">(TODO)raw intensities</option>
+	    	<option value="getApexIntensity">(TODO)apex intensities</option>
+	    	<option value="getNormalizedIntensity">(TODO)normalized intensities</option>
+		</param>
+   		<!-- TODO let minCorrel default value vary according to quantification type chosen above -->
+		<param name="minCorrel" type="float" size="10" value="0.85" label="Minimum correlation in a cluster" help="Features will be grouped by their protein annotation and
+		sample intensity values correlation. Set here the minimum correlation expected between grouped members. This is used to guide the clustering algorithm."/>
+
+		<!--  simple extra heuristics to remove some "noise" protein hits  -->
+		<param name="minProtCoverage" type="float" size="10" value="5.0" label="Minimum protein coverage (%)" help="This will remove proteins that have a too small
+		portion of their sequence covered by peptide matches."/>
+
+		<param name="minAboveAverageHits" type="integer" size="10" value="1" label="Minimum number of different peptide matches with a score above average"
+		help="This will remove proteins that do not have enough reasonable peptides hits."/>
+
+		<param name="minNrIdsForInferencePeptide" type="integer" size="10" value="1" label="Minimum number of peptide identifications for inference peptides"
+		help="Minimum number of peptide identifications a peptide needs to be used as inference peptide for secondary proteins."/>
+
+
+     	<param name="functionalAnnotationCSV" type="data" format="csv,txt,tsv" optional="true"
+     	label="(Functional)annotation mapping file (csv or tsv format)"
+     	help="Optional file that maps protein accessions to a network, pathway or other higher level annotations. In this file a header line is expected with these 2 columns (names and lower case is important): accession,annotation"/>
+
+     	<param name="refineModel" type="boolean" checked="true" label="Refine matches model"
+     	help="This will let the algorithm search for a reduced set of secondary protein matches that still explains the variation in the peptide quantification patterns"/>
+
+
+     	<param name="summaryReport" type="boolean" checked="true" label="Generate summary report"/>
+
+	</inputs>
+	<configfiles>
+		<configfile name="annotatedQuantificationFilesList">## start comment
+		## iterate over the selected files and store their names in the config file
+		#for $i, $s in enumerate( $annotatedQuantificationFiles )
+			${s.annotatedQuantificationFile}
+		#end for
+		## end comment</configfile>
+
+		<configfile name="identificationFilesList">## start comment
+		## iterate over the selected files and store their names in the config file
+		#for $i, $s in enumerate( $identificationFiles )
+			${s.identificationFile}
+			## also print out the datatype in the next line, based on previously configured datatype
+			#if isinstance( $s.identificationFile.datatype, $__app__.datatypes_registry.get_datatype_by_extension('apml').__class__):
+				apml
+			#else:
+        		mzid
+      		#end if
+		#end for
+		## end comment</configfile>
+		<configfile name="statisticalMeasuresConfigFile">## start comment
+			${statisticalMeasuresConfig}
+		</configfile>
+	</configfiles>
+	<outputs>
+	  <data name="outputCSV" format="csv" label="${tool.name} on ${on_string}: Proteins list (CSV)" />
+	  <data name="outputInferenceLogCSV" format="csv" label="${tool.name} on ${on_string}: Inference log (CSV)"/>
+	  <data name="htmlReportFile" format="html" label="${tool.name} on ${on_string} - HTML report">
+	 	<!-- If the expression is false, the file is not created -->
+	  	<filter>( summaryReport == True )</filter>
+	  </data>
+	  <data name="outputSummaryAnnotationCSV" format="csv" label="${tool.name} on ${on_string} - Functional annotation summary (CSV)">
+	 	<!-- If the expression is false, the file is not created -->
+	  	<filter>( functionalAnnotationCSV != None )</filter>
+	  </data>
+	</outputs>
+	<tests>
+	</tests>
+  <help>
+
+.. class:: infomark
+
+This tool takes Peptide Quantification patterns and uses this to do Protein Inference of both Primary Protein
+identifications as well as Secondary Protein identifications. This last class of protein identifications
+can not be done by traditional protein inference methods that look only at peptide identifications and
+their quality parameters.
+
+
+-----
+
+**List of definitions**
+
+Primary Protein identification: protein identification belonging to the minimum set of proteins needed
+to account for the observed peptides.
+
+Secondary Protein identification: extra protein identifications that do not below to the minimum set
+of proteins mentioned above.
+
+raw intensities : is the intensity value resulting from the integration of the feature peak area
+
+apex intensities: is the intensity value as on the highest point of the feature peak
+
+normalized intensities : is the intensity normalized by some means
+
+-----
+
+**Minimum correlation in a cluster**
+
+TODO - add doc.
+
+-----
+
+**Output details**
+
+*Proteins list (CSV)*
+
+This is the list of primary and secondary proteins and their calculated inference score. Proteins
+with exactly the same peptide hits are also grouped together and labeled as primary_group and secondary_group
+instead of simply primary and secondary.
+
+
+*Inference log (CSV)*
+
+This CSV table shows all data, both inferred and ruled out proteins. This can be used by the user to
+troubleshoot the inference process and understand why certain proteins might have been ruled out.
+The CSV is provided in such a format that the data can easily be explored in a Cytoscape network.
+
+The figure below shows an example of the data being explored in Cytoscape using also the
+`Cytoscape chartplugin`_ to visualize the quantification data when selecting the peptide nodes.
+
+.. image:: $PATH_TO_IMAGES/quantifere_cyto_out.png
+
+
+.. _Cytoscape chartplugin: http://apps.cytoscape.org/apps/chartplugin
+
+
+
+  </help>
+</tool>
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/quantiline.xml	Wed Jan 08 11:39:16 2014 +0100
@@ -0,0 +1,62 @@
+<tool name="Quantiline" id="quantiline1" version="1.0.2">
+	<description>Labeled ms/ms data pre-processing for Protein Quantification (and Inference) pipelines</description>
+	<!--
+	   For remote debugging start you listener on port 8000 and use the following as command interpreter:
+	       java -jar -Xdebug -Xrunjdwp:transport=dt_socket,address=D0100564.wurnet.nl:8000
+	                    //////////////////////////
+	    -->
+	<command interpreter="java -jar ">
+	    Quantiline.jar
+	    -ppidsFileName $ppidsFileName
+	    -spectraDataFile $spectraDataFile
+	    -ppidsInputFormat MZID
+	    -labelMzValues "$labelMzValues"
+	    -labelmTol $labelmTol
+	    -outputFile $outputFile
+	    -outReport $outReport
+	</command>
+	<inputs>
+
+	 	<param name="ppidsFileName" type="data" format="prims.fileset.zip" label="MS/MS peptide identifications fileSet (N mzidentml files)"/>
+	 	<param name="spectraDataFile" type="data" format="prims.fileset.zip" label="MS/MS spectra fileSet (N mzml files)"/>
+
+	 	<param name="labelMzValues" type="text" size="20" label="Label m/z values"
+	 	help="e.g. for 4plexed iTRAQ : 114.0,115.0,116.0,117.0"/>
+
+		<param name="labelmTol" type="float" size="10" value="0.5" label="Label detection tolerance (Da)"
+		help="Tolerance in daltons for label detection."/>
+
+	</inputs>
+	<outputs>
+	  <data name="outputFile" format="apml" label="${tool.name} on ${on_string}: Peptides quantification (APML)" />
+	  <data name="outReport" format="html" label="${tool.name} on ${on_string}: Peptides quantification report (HTML)"/>
+	</outputs>
+	<tests>
+	</tests>
+  <help>
+
+.. class:: infomark
+
+This tool can read spectra files (mzML) and their respective identification files (mzIdentML) and based
+on the configured label masses produce a file that contains the merged information:
+peptides and their quantification based on label fragment intensity values read from the spectrum in which they
+were identified.
+
+In other words, it produces the peptide (relative) quantification file. This file can subsequently be used
+by other tools for protein inference and protein quantification (e.g. Quantifere).
+
+
+-----
+
+**Output details**
+
+*Peptide quantification file (APML)*
+
+This is the list of peptides with their (relative) quantification based on the labels and their
+intensities found in the label peaks of the corresponding spectrum.
+
+
+
+
+  </help>
+</tool>
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/repository_dependencies.xml	Wed Jan 08 11:39:16 2014 +0100
@@ -0,0 +1,5 @@
+<?xml version="1.0"?>
+<repositories description="Required proteomics dependencies.">
+  <repository toolshed="http://toolshed.g2.bx.psu.edu" name="proteomics_datatypes" owner="iracooke" changeset_revision="09b89b345de2" />
+  <repository toolshed="http://testtoolshed.g2.bx.psu.edu" name="proteomics_datatypes" owner="iracooke" changeset_revision="7101f7e4b00b" />
+</repositories>
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/sedmat.xml	Wed Jan 08 11:39:16 2014 +0100
@@ -0,0 +1,144 @@
+<tool name="SedMat" id="sedmat1" version="1.0.2">
+	<description>Matches MS and MS/MS results</description>
+	<!--
+	   For remote debugging start you listener on port 8000 and use the following as command interpreter:
+	       java -jar -Xdebug -Xrunjdwp:transport=dt_socket,address=D0100564.wurnet.nl:8000
+	    -->
+	<command interpreter="java -jar ">
+	    SedMat_cli.jar
+	    -pl $inputMS
+	    -plInputFormat apml
+	    -ppids $fileType.inputFormatType.ppidsFile
+	    -ppidsFileGrouping $fileType.type
+	    -ppidsInputFormat $fileType.inputFormatType.ppidsInputFormat
+	    -ppidsFileDescription $fileType.inputFormatType.ppidsFile.name
+	    #if $fileType.inputFormatType.ppidsInputFormat == "mzid"
+			-spectraDataFile $fileType.inputFormatType.spectraDataFile
+		#end if
+	    -out $outputData
+	    -outUnmatchedMS2 $outUnmatchedMS2
+	    -mtol $mtol
+	    -rttol $rttol
+	    -rtShiftDetectionWindow $rtShiftDetectionWindow
+	    -matchOnSameSourceOnly $matchOnSameSourceOnly
+	    -chargeStatesToGenerate $chargeStatesToGenerate
+	    -outReport $htmlReportFile
+	    -outReportPicturesPath $htmlReportFile.files_path
+        #if $troubleshoot1.troubleshootPeakLocations == True
+        	-troubleshootPeakLocations YES
+        	-mStart $troubleshoot1.mStart
+        	-mEnd $troubleshoot1.mEnd
+        	-rtStart $troubleshoot1.rtStart
+        	-rtEnd $troubleshoot1.rtEnd
+        	-filterSourceName $troubleshoot1.filterSourceName
+        #end if
+        #if $matchOnNamingConvention.match == True
+        	-matchOnNamingConvention YES
+        	-namingConventionCodesForMatching $matchOnNamingConvention.namingConventionCodesForMatching
+        #end if
+
+	</command>
+
+	<inputs>
+
+  		<param name="inputMS" type="data" format="apml" label="MS data (APML format)" />
+	 	<!-- possible option <validator type="metadata" check="base_name" message="Metadata missing, click the pencil icon in the history item and set base_name."/> -->
+
+	 	<conditional name="fileType">
+		    <param name="type" type="select" label="Peptide identification file grouping type">
+		      <option value="single" selected="true">single-File</option>
+		      <option value="fileSet">fileSet</option>
+		    </param>
+		    <when value="single">
+		      <conditional name="inputFormatType">
+		      	<param name="ppidsInputFormat" type="select" label="MS/MS input format">
+			    	<option value="mzid" selected="true">mzIdentML on mzML</option>
+			    	<option value="apml">APML</option>
+				</param>
+				<when value="mzid">
+		      		<param name="spectraDataFile" type="data" format="mzml" label="MS/MS spectra file (mzml)"/>
+		      		<param name="ppidsFile" type="data" format="mzid" label="MS/MS peptide identifications file (mzidentml)"/>
+		      	</when>
+		      	<when value="apml">
+		      		<param name="ppidsFile" type="data" format="apml" label="MS/MS peptide identifications file (apml)">
+		      			<!-- TODO - find out how to use
+		      			<validator type="expression" message="You already selected this file as the MS data file.">value.id == inputMS,{"inputMS":$inputMS},{}</validator>-->
+		      		</param>
+		      	</when>
+		      </conditional>
+		    </when>
+		    <when value="fileSet">
+		      <conditional name="inputFormatType">
+		      	<param name="ppidsInputFormat" type="select" label="inputFormat">
+			    	<option value="mzid" selected="true">mzIdentML on mzML</option>
+				</param>
+				<when value="mzid">
+		      		<param name="spectraDataFile" type="data" format="prims.fileset.zip" label="MS/MS spectra fileSet (N mzml files)"/>
+		      		<param name="ppidsFile" type="data" format="prims.fileset.zip" label="MS/MS peptide identifications fileSet (N mzidentml files)"/>
+		      	</when>
+		      </conditional>
+		    </when>
+		</conditional>
+		<param name="mtol" type="integer" size="10" value="50" label="m/z tolerance (ppm) " />
+		<param name="rttol" type="integer" size="10" value="150" label="Rention time tolerance (seconds) " />
+		<param name="rtShiftDetectionWindow" type="integer" size="10" value="20" label="Rention time shift detection window (seconds) " help="Size of the window to use for average rt shift calculations"/>
+
+		<param name="matchOnSameSourceOnly" type="boolean" checked="false" label="Match peaks from same source only" help="If you want this, you might have to inform how to match the source files"/>
+     	<conditional name="matchOnNamingConvention">
+     		<param name="match" type="boolean" truevalue="Yes" falsevalue="No" checked="false" label="Match using naming convention" help="Use a list of codes that occur in the file names and that link them together."/>
+     		<when value="Yes">
+     			<param name="namingConventionCodesForMatching" type="text" size="100" value="" label="List of codes in naming convention" help="Add the CSV list of codes that occur in the file names and that link them together. E.g. '_F1,_F2,_F3,etc.'"/>
+     		</when>
+     	</conditional>
+
+ 		<param name="chargeStatesToGenerate" type="select" display="checkboxes" multiple="true" label="Generate extra charge states" help="The selected charge states will be generated for each MS2 feature ">
+	      	<option value="1" selected="true">1</option>
+	      	<option value="2" selected="true">2</option>
+	      	<option value="3" selected="true">3</option>
+	      	<option value="4" selected="true">4</option>
+	      	<option value="5">5</option>
+		</param>
+
+   		<param name="summaryReport" type="boolean" checked="true" label="Generate summary report" help="NB: this will increase the processing time"/>
+
+     	<conditional name="troubleshoot1">
+     		<param name="troubleshootPeakLocations" type="boolean" truevalue="Yes" falsevalue="No" checked="false" label="Troubleshoot ms1/ms2 peak locations" help="Small trial run to check if the MS and MS/MS peak lists in their current states can easily be matched "/>
+     		<when value="Yes">
+     			<param name="mStart" optional="false" type="integer" size="10" value="100" label="Set m/z start " />
+     			<param name="mEnd" optional="false" type="integer" size="10" value="1000" label="Set m/z end " />
+				<param name="rtStart" optional="false" type="integer" size="10" value="10" label="Set rention time start (minutes) " />
+				<param name="rtEnd" optional="false" type="integer" size="10" value="20" label="Set rention time end (minutes) " />
+				<param name="filterSourceName" type="text" size="100" value="" label="Restrict matching to a specific subset of the files " help="Part of a file name that occurs in both a ms1 and ms2 file (e.g. 'RibO_1_msE1')"/>
+     		</when>
+     	</conditional>
+
+	</inputs>
+	<outputs>
+	  <data name="outputData" format="apml" label="${inputMS.metadata.base_name} - ${tool.name} on ${on_string}: APML" metadata_source="inputMS"></data>
+	  <data name="outUnmatchedMS2" format="csv" label="${inputMS.metadata.base_name} - ${tool.name} on ${on_string}: unmatched MS2 features CSV" metadata_source="inputMS"></data>
+	  <data name="htmlReportFile" format="html" label="${tool.name} on ${on_string} - HTML report">
+	 	<!-- If the expression is false, the file is not created -->
+	  	<filter>( summaryReport == True )</filter>
+	  </data>
+	</outputs>
+	<tests>
+	  <!--  find out how to use -->
+	  <test>
+	  </test>
+	</tests>
+  <help>
+
+.. class:: infomark
+
+This tool matches MS and MS/MS results. SEDMAT stands for "Single Experiment Data Matching Tool".
+It can match peaks found in the MS spectra with the peptides found using the MS/MS spectra.
+The result is the list of MS peaks annotated with peptides and proteins.
+
+-----
+
+**Output example**
+
+This tools returns APML output, a Cytoscape network (.xgmml) of the matches and Retention Time plots (.pdf).
+
+  </help>
+</tool>
Binary file static/images/msfilt_csv_out.png has changed
Binary file static/images/napq_overview.png has changed
Binary file static/images/quantifere_cyto_out.png has changed