Mercurial > repos > pieterlukasse > prims_proteomics
annotate msfilt.xml @ 17:40ec8770780d
* Added support for pepxml (and more specifically for
ProteomeDiscoverer 1.4). Tested with ProteomeDiscoverer 1.4 pepxml.
* Improved HTML report of NapQ tool.
* Fixed issue that was preventing SEDMAT matching from running
in parallel/multi-threaded.
author | pieter.lukasse@wur.nl |
---|---|
date | Mon, 14 Apr 2014 17:11:33 +0200 |
parents | 72d4a37869ee |
children | ad911e9aaf33 |
rev | line source |
---|---|
17
40ec8770780d
* Added support for pepxml (and more specifically for
pieter.lukasse@wur.nl
parents:
10
diff
changeset
|
1 <tool name="MsFilt" id="msfilt" version="1.0.4"> |
0 | 2 <description>Filters annotations based MS/MS peptide identification and annotation quality measures</description> |
3 <!-- | |
4 For remote debugging start you listener on port 8000 and use the following as command interpreter: | |
5 java -jar -Xdebug -Xrunjdwp:transport=dt_socket,address=D0100564.wurnet.nl:8000 | |
6 ////////////////////////// | |
7 --> | |
8 <command interpreter="java -jar "> | |
9 MsFilt.jar | |
10 -apmlFile $apmlFile | |
11 -datasetCode $apmlFile.metadata.base_name | |
12 -rankingMetadataFile $rankingMetadataFile | |
13 -statisticalMeasuresConfigFile $statisticalMeasuresConfigFile | |
14 -annotationSourceConfigFile $annotationSourceConfigFile | |
15 -outApml $outputApml | |
16 -outNewIdsApml $outNewIdsApml | |
17 -outFullCSV $outputCSV | |
18 -outRankingTable $outRankingTable | |
19 -outProteinCoverageCSV $outProteinCoverageCSV | |
20 -fpCriteriaExpression "$fpCriteriaExpression" | |
21 -filterOutFPAnnotations $filterOutFPAnnotations | |
22 -fpCriteriaExpressionForIds "$fpCriteriaExpressionForIds" | |
23 -filterOutFPIds $filterOutFPIds | |
24 -filterOutUnannotatedAlignments $filterOutUnannotatedAlignments | |
25 -addRawRankingInfo $addRawRankingInfo | |
26 -addScaledIntensityInfo $addScaledIntensityInfo | |
27 -addRawIntensityInfo $addRawIntensityInfo | |
28 -outReport $htmlReportFile | |
29 -outReportPicturesPath $htmlReportFile.files_path | |
17
40ec8770780d
* Added support for pepxml (and more specifically for
pieter.lukasse@wur.nl
parents:
10
diff
changeset
|
30 #if $containsPepxml.pepxmlInSet == True |
40ec8770780d
* Added support for pepxml (and more specifically for
pieter.lukasse@wur.nl
parents:
10
diff
changeset
|
31 -pepxmlDataType $containsPepxml.pepxmlDataType |
40ec8770780d
* Added support for pepxml (and more specifically for
pieter.lukasse@wur.nl
parents:
10
diff
changeset
|
32 -pepxmlGeneratedBy $containsPepxml.pepxmlGeneratedBy |
40ec8770780d
* Added support for pepxml (and more specifically for
pieter.lukasse@wur.nl
parents:
10
diff
changeset
|
33 #end if |
0 | 34 </command> |
35 | |
36 <inputs> | |
37 | |
38 <param name="apmlFile" type="data" format="apml" optional="true" | |
39 label="(Optional) Peptide quantification file (APML)" | |
40 help="The APML contents as aligned and annotated feature lists. E.g. produced by | |
41 SEDMAT or Quantiline tools." /> | |
42 | |
43 <repeat name="annotationSourceFiles" title="(Optional) Peptide identification files" help="Full set of MS/MS peptide identification files, including peptides that could not be quantified."> | |
17
40ec8770780d
* Added support for pepxml (and more specifically for
pieter.lukasse@wur.nl
parents:
10
diff
changeset
|
44 <param name="identificationsFile" type="data" format="apml,pepxml,mzidentml,prims.fileset.zip" label="Identifications file (APML, pepxml, MZIDENTML or MZIDENTML fileSet)" /> |
0 | 45 </repeat> |
46 | |
17
40ec8770780d
* Added support for pepxml (and more specifically for
pieter.lukasse@wur.nl
parents:
10
diff
changeset
|
47 <!-- ================== PEPXML specific ================== --> |
40ec8770780d
* Added support for pepxml (and more specifically for
pieter.lukasse@wur.nl
parents:
10
diff
changeset
|
48 <conditional name="containsPepxml"> |
40ec8770780d
* Added support for pepxml (and more specifically for
pieter.lukasse@wur.nl
parents:
10
diff
changeset
|
49 <param name="pepxmlInSet" type="boolean" truevalue="Yes" falsevalue="No" checked="false" |
40ec8770780d
* Added support for pepxml (and more specifically for
pieter.lukasse@wur.nl
parents:
10
diff
changeset
|
50 label="Identifications set contains one or more files in pepxml format" |
40ec8770780d
* Added support for pepxml (and more specifically for
pieter.lukasse@wur.nl
parents:
10
diff
changeset
|
51 help="Indicate whether one or more (Optional) Peptide identification files is in pepxml format. Support for pepxml is still considered 'beta'."/> |
40ec8770780d
* Added support for pepxml (and more specifically for
pieter.lukasse@wur.nl
parents:
10
diff
changeset
|
52 <when value="Yes"> |
40ec8770780d
* Added support for pepxml (and more specifically for
pieter.lukasse@wur.nl
parents:
10
diff
changeset
|
53 <param name="pepxmlDataType" type="select" label=">> Type of data stored in the pepxml" |
40ec8770780d
* Added support for pepxml (and more specifically for
pieter.lukasse@wur.nl
parents:
10
diff
changeset
|
54 help="Options marked with (*) are ProteomeDiscoverer specific scenarios"> |
40ec8770780d
* Added support for pepxml (and more specifically for
pieter.lukasse@wur.nl
parents:
10
diff
changeset
|
55 <option value="" selected="true">--Please select--</option> |
40ec8770780d
* Added support for pepxml (and more specifically for
pieter.lukasse@wur.nl
parents:
10
diff
changeset
|
56 <option value="single_2d" >2D LC-MS runs, one per msms_run_summary</option> |
40ec8770780d
* Added support for pepxml (and more specifically for
pieter.lukasse@wur.nl
parents:
10
diff
changeset
|
57 <option value="multi_2d">(*) 2D LC-MS runs, multiple runs (e.g. rx.F1 to rx.FN) merged as a 'single' msms_run_summary</option> |
40ec8770780d
* Added support for pepxml (and more specifically for
pieter.lukasse@wur.nl
parents:
10
diff
changeset
|
58 <option value="single_1d">1D LC-MS runs, one per msms_run_summary</option> |
40ec8770780d
* Added support for pepxml (and more specifically for
pieter.lukasse@wur.nl
parents:
10
diff
changeset
|
59 </param> |
40ec8770780d
* Added support for pepxml (and more specifically for
pieter.lukasse@wur.nl
parents:
10
diff
changeset
|
60 <param name="pepxmlGeneratedBy" type="select" label=">> pepxml generated by" |
40ec8770780d
* Added support for pepxml (and more specifically for
pieter.lukasse@wur.nl
parents:
10
diff
changeset
|
61 help="Some tools, like ProteomeDiscoverer 1.4, have specific issues in their pepxml generation logic. Correctly indicating the tool used here will ensure known issues are taken |
40ec8770780d
* Added support for pepxml (and more specifically for
pieter.lukasse@wur.nl
parents:
10
diff
changeset
|
62 into consideration when the file is parsed." > |
40ec8770780d
* Added support for pepxml (and more specifically for
pieter.lukasse@wur.nl
parents:
10
diff
changeset
|
63 <option value="" selected="true">--Please select--</option> |
40ec8770780d
* Added support for pepxml (and more specifically for
pieter.lukasse@wur.nl
parents:
10
diff
changeset
|
64 <option value="proteome_discoverer_v1.4">ProteomeDiscoverer 1.4</option> |
40ec8770780d
* Added support for pepxml (and more specifically for
pieter.lukasse@wur.nl
parents:
10
diff
changeset
|
65 <option value="other">Other</option> |
40ec8770780d
* Added support for pepxml (and more specifically for
pieter.lukasse@wur.nl
parents:
10
diff
changeset
|
66 </param> |
40ec8770780d
* Added support for pepxml (and more specifically for
pieter.lukasse@wur.nl
parents:
10
diff
changeset
|
67 </when> |
40ec8770780d
* Added support for pepxml (and more specifically for
pieter.lukasse@wur.nl
parents:
10
diff
changeset
|
68 <when value="No"> |
40ec8770780d
* Added support for pepxml (and more specifically for
pieter.lukasse@wur.nl
parents:
10
diff
changeset
|
69 </when> |
40ec8770780d
* Added support for pepxml (and more specifically for
pieter.lukasse@wur.nl
parents:
10
diff
changeset
|
70 </conditional> |
40ec8770780d
* Added support for pepxml (and more specifically for
pieter.lukasse@wur.nl
parents:
10
diff
changeset
|
71 <!-- ================== END - PEPXML specific ================== --> |
40ec8770780d
* Added support for pepxml (and more specifically for
pieter.lukasse@wur.nl
parents:
10
diff
changeset
|
72 |
0 | 73 <!-- |
74 <param name="maxNrRankings" type="integer" size="10" value="0" label="Maximum nr. of items to leave in the final ranking (set=0 for no limit) " /> | |
75 --> | |
76 <!-- TODO add info somewhere that deltaRt is 'corrected deltaRt' --> | |
77 <param name="rankingWeightConfig" type="text" area="true" size="13x70" label="Quality Measures (qm's) and ranking weights configuration" | |
78 help="Here you may specify a weight for each of the Quality Measures (QMs). These are used for the final QM score and possibly for ranking (e.g. in case of label-free data | |
79 processed by SEDMAT). The format is: QM alias => QM name,weight. " | |
80 value="qmDRT => delta rt (standard score),1 | |
81 
qmDMA => delta mass annotation (standard score),1 | |
82 
qmDMP => delta mass psm (standard score),1 | |
83 
qmBSCR => best peptide score (standard score),1 | |
84 
qmALCV => alignment coverage (fraction),1 | |
85 
qmSTCV => score type coverage (fraction),1 | |
86 
qmPACV => peptide's best proteinAnnotCoverage (standard score),1 | |
87 
qmPICV => peptide's best proteinIdentifCoverage (standard score),1 | |
88 
qmANS => annotation sources (count),1 | |
89 
qmCSEV => charge states evidence (count),0.2 | |
90 
qmBCSP=> best correlation with source or product peptide (correl),1 | |
91 
qmBCCS => best correlation with other charge state (correl),1 | |
92 
qmBCOS => best correlation with other sibling peptide (correl),1 | |
93 "/> | |
94 | |
17
40ec8770780d
* Added support for pepxml (and more specifically for
pieter.lukasse@wur.nl
parents:
10
diff
changeset
|
95 <param name="statisticalMeasuresConfig" type="text" area="true" size="8x70" label="Statistical measures configuration" |
0 | 96 help="Here you may specify the statistical measures that are found in the ms/ms results (e.g. p or e-values). |
97 The format is: SM alias => SM name,type,mode[min/max]. " | |
98 value="smXTD => MS:1001330,XSLASH!Tandem:expect,min | |
99 
pvCSVEX => p_value,CSV_EXPORT,min | |
100 
smAUTO_LIKELIHOOD => AUTOMOD_LOGLIKELIHOOD,PLGS/Auto-mod,max | |
101 
smLIKELIHOOD => LOGLIKELIHOOD,PLGS/Databank-search,max | |
17
40ec8770780d
* Added support for pepxml (and more specifically for
pieter.lukasse@wur.nl
parents:
10
diff
changeset
|
102 
smPercoProb => Percolator: probability,Percolator probability,max |
40ec8770780d
* Added support for pepxml (and more specifically for
pieter.lukasse@wur.nl
parents:
10
diff
changeset
|
103 
smPercoPEP => Percolator: PEP,Percolator PEP,min |
40ec8770780d
* Added support for pepxml (and more specifically for
pieter.lukasse@wur.nl
parents:
10
diff
changeset
|
104 
smPercoQval => Percolator: q-Value,Percolator q-Value,max |
0 | 105 "/> |
106 | |
107 <param name="filterOutUnannotatedAlignments" type="boolean" checked="true" | |
108 label="Filter out unannotated alignments" | |
109 help="This helps decrease the output file size (features with no annotation are then not reported anymore)"/> | |
110 | |
111 <param name="filterOutFPAnnotations" type="boolean" checked="true" | |
112 label="Filter out False Positive (FP) annotations" /> | |
113 | |
114 <param name="fpCriteriaExpression" type="text" size="120" label="False Positive (FP) criteria for annotations" | |
115 help="Criteria (in standard score measures) for classifying an annotation as False Positive (FP). | |
116 You can build logical rules using the QM aliases above, the keywords 'and', 'or' and parenthesis. | |
117 Comparisons can be made with '==,<,><=,>='" | |
118 value="qmDRT <0 or qmDMA <-0.5 or (qmDMP <-0.5 and qmBSCR<-0.5) or (!isNaN(smXTD) and smXTD >0.01)"/> | |
119 | |
120 | |
121 <param name="filterOutFPIds" type="boolean" checked="true" | |
122 label="Filter out False Positive (FP) peptide identifications" /> | |
123 | |
124 <param name="fpCriteriaExpressionForIds" type="text" size="120" | |
125 label="False Positive (FP) criteria for identifications" | |
126 help="Criteria (in standard score measures) for classifying a peptide identification as False Positive (FP). | |
127 Here you can use a subset of the quality measures (qmDMP, qmBSCR, qmSTCV, qmPICV, qmCSEV) and all statistical measures." | |
128 value="(qmDMP <-0.5 and qmBSCR<-0.5) or (!isNaN(smXTD) and smXTD >0.01)"/> | |
129 | |
130 | |
131 <param name="addRawRankingInfo" type="boolean" checked="false" | |
132 label="Include the raw scores/values of the ranking attributes in the CSV output" | |
133 help="This will result in one extra column per ranking attribute, each column holding the original data for this attribute (before normalization)."/> | |
134 | |
135 <param name="addScaledIntensityInfo" type="boolean" checked="false" | |
136 label="Include computed scaled intensity values in the CSV output" | |
137 help="The autoscaled and 'z-score'scaled (aka 'standard-score'scaled) intensity values are then added to the full CSV output file"/> | |
138 | |
139 <param name="addRawIntensityInfo" type="boolean" checked="false" | |
140 label="Include the raw intensity values in the CSV output" | |
141 help="The original intensity values (as found in the input file) are then added to the full CSV output file"/> | |
142 | |
143 | |
144 </inputs> | |
145 <configfiles> | |
146 <configfile name="rankingMetadataFile">${rankingWeightConfig}</configfile> | |
147 <configfile name="statisticalMeasuresConfigFile">${statisticalMeasuresConfig}</configfile> | |
148 <configfile name="annotationSourceConfigFile">## start comment | |
149 ## iterate over the selected files and store their names in the config file | |
150 #for $i, $s in enumerate( $annotationSourceFiles ) | |
10 | 151 ${s.identificationsFile} |
0 | 152 ## also print out the datatype in the next line, based on previously configured datatype |
17
40ec8770780d
* Added support for pepxml (and more specifically for
pieter.lukasse@wur.nl
parents:
10
diff
changeset
|
153 #if isinstance( $s.identificationsFile.datatype, $__app__.datatypes_registry.get_datatype_by_extension('pepxml').__class__): |
40ec8770780d
* Added support for pepxml (and more specifically for
pieter.lukasse@wur.nl
parents:
10
diff
changeset
|
154 pepxml |
40ec8770780d
* Added support for pepxml (and more specifically for
pieter.lukasse@wur.nl
parents:
10
diff
changeset
|
155 #elif isinstance( $s.identificationsFile.datatype, $__app__.datatypes_registry.get_datatype_by_extension('apml').__class__): |
0 | 156 apml |
157 #else: | |
158 mzid | |
159 #end if | |
160 #end for | |
161 ## end comment</configfile> | |
162 </configfiles> | |
163 <outputs> | |
164 <data name="outputApml" format="apml" label="${apmlFile.metadata.base_name} - ${tool.name} on ${on_string}: quantifications (filtered APML)" metadata_source="apmlFile"> | |
165 <!-- If the expression is false, the file is not created --> | |
166 <filter>( apmlFile != None )</filter> | |
167 </data> | |
168 <data name="outNewIdsApml" format="apml" label="${tool.name} on ${on_string}: identifications (filtered APML)" > | |
169 <filter>( filterOutFPIds == True )</filter> | |
170 </data> | |
171 <data name="outputCSV" format="csv" label="${apmlFile.metadata.base_name} - ${tool.name} on ${on_string}: Full CSV" metadata_source="apmlFile"> | |
172 <filter>( apmlFile != None )</filter> | |
173 </data> | |
174 <data name="outRankingTable" format="csv" label="${apmlFile.metadata.base_name} - ${tool.name} on ${on_string}: Ranking table (CSV)" metadata_source="apmlFile"> | |
175 <filter>( apmlFile != None )</filter> | |
176 </data> | |
177 <data name="outProteinCoverageCSV" format="csv" label="${tool.name} on ${on_string}: Protein coverage details (CSV)"> | |
178 <!-- If the expression is false, the file is not created --> | |
179 <filter>( len(list(enumerate(annotationSourceFiles))) > 0 )</filter> | |
180 </data> | |
181 <data name="htmlReportFile" format="html" label="${tool.name} on ${on_string} - HTML report"/> | |
182 </outputs> | |
183 <tests> | |
184 </tests> | |
185 <help> | |
186 | |
187 .. class:: infomark | |
188 | |
189 This tool takes in peptide quantification results (e.g. either by SEDMAT for label-free data or by Quantiline for labeled data) | |
190 and calculates a number of quality measures that can help in assessing the correctness of the quantification assignment and of the MS/MS peptide | |
191 identification itself. The user can use any combination of quality measures (qm's) and statistical measures (sm's) to filter out | |
192 low scoring entries. | |
193 | |
194 .. class:: infomark | |
195 | |
196 In the label-free data processed by SEDMAT it is possible that a feature quantification gets assigned to different peptides. This means | |
197 we have an ambiguous assignment. In such a case | |
198 this tool also does a ranking of the different assignments according to their quality measures so that the best scoring assignment | |
199 gets ranked as first. | |
200 | |
201 ----- | |
202 | |
203 **List of abbreviations** | |
204 | |
205 QM: Quality Measure | |
206 | |
207 SM: Statistical Measure (e.g. p-value, e-value from MS/MS identification) | |
208 | |
209 PSM: "Peptide to Spectrum Match" (aka peptide identification) | |
210 | |
211 FP: False Positive | |
212 | |
213 ----- | |
214 | |
215 **Filtering options details** | |
216 | |
217 The FP criteria will be applied to an annotation even if the corresponding quality measures involved | |
218 in the expression can NOT ALL be determined. QMs that cannot be determined, get the value 0 (zero) which is | |
219 equal to giving it the average value. | |
220 | |
221 The output report shows some plots that visualize the filtering done. This can help in fine-tuning the right filtering | |
222 criteria. | |
223 | |
224 ----- | |
225 | |
226 **Output details** | |
227 | |
228 *APML output* | |
229 | |
230 This tools returns the given APML alignment file further annotated at the alignment level with the best ranking | |
231 peptides of each respective alignment. This APML can be used in subsequent Galaxy tools like the proteomics tools | |
232 from NBIC. | |
233 | |
234 The APML output can also be used for the Protein Inference step (see Quantifere tool). | |
235 | |
236 *CSV output* | |
237 | |
238 It also returns a CSV format output with the full quality measures and scoring and ranking details. The user could use | |
239 this to manually determine new weights for some of the quality measures by techniques such as | |
240 linear regression. In other words, this CSV can then be used to fine-tune the weights in a next run. | |
241 | |
242 Many of the quality measures (QMs) are normalized to their Standard Score (aka z-score). | |
243 `See Standard Score for more details...`__ | |
244 | |
245 Next to giving insight into how the ranking was established, a more complete version of this CSV file is also | |
246 generated for tools that cannot or won't process the APML output format. | |
247 | |
248 Below an brief overview of the CSV and an illustration of the ranking done in case of ambiguous peptides to feature assignments | |
249 (explained above, can happen in case of label-free data processing by SEDMAT). | |
250 | |
251 | |
252 .. image:: $PATH_TO_IMAGES/msfilt_csv_out.png | |
253 | |
254 | |
255 | |
256 .. __: javascript:window.open('http://en.wikipedia.org/wiki/Standard_score','popUpWindow','height=700,width=800,left=10,top=10,resizable=yes,scrollbars=yes,toolbar=yes,menubar=no,location=no,directories=no,status=yes') | |
257 | |
258 | |
259 | |
260 | |
261 </help> | |
262 </tool> |