comparison profia_config.xml @ 0:39ccace77270 draft

planemo upload for repository https://github.com/workflow4metabolomics/profia.git commit 2757590af8c7ba9833ba3bebd7da7f96b20d1128-dirty
author ethevenot
date Sun, 26 Mar 2017 17:37:12 -0400
parents
children 4753e64cf694
comparison
equal deleted inserted replaced
-1:000000000000 0:39ccace77270
1 <tool id="profia" name="proFIA" version="3.0.0">
2 <description>Preprocessing of FIA-HRMS data</description>
3
4 <requirements>
5 <requirement type="package">r-batch</requirement>
6 <requirement type="package">r-FNN</requirement>
7 <requirement type="package">r-maxLik</requirement>
8 <requirement type="package">r-minpack.lm</requirement>
9 <requirement type="package">r-pracma</requirement>
10 <requirement type="package">bioconductor-proFIA</requirement>
11 </requirements>
12
13 <stdio>
14 <exit_code range="1:" level="fatal" />
15 </stdio>
16
17 <command><![CDATA[
18 Rscript $__tool_directory__/profia_wrapper.R
19
20 #if $inputs.input == "lib":
21 library $__app__.config.user_library_import_dir/$__user_email__/$inputs.library
22 #elif $inputs.input == "zip_file":
23 zipfile $inputs.zip_file
24 #end if
25
26 ppmN "$ppmN"
27 ppmGroupN "$ppmGroupN"
28 fracGroupN "$fracGroupN"
29 kI "$kI"
30
31 dataMatrix_out "$dataMatrix_out"
32 sampleMetadata_out "$sampleMetadata_out"
33 variableMetadata_out "$variableMetadata_out"
34 figure "$figure"
35 information "$information"
36 ]]></command>
37
38 <inputs>
39 <conditional name="inputs">
40 <param name="input" type="select" label="Choose your input method" >
41 <option value="zip_file" selected="true">Zip file from your history containing your raw files</option>
42 <option value="lib" >Library directory name</option>
43 </param>
44 <when value="zip_file">
45 <param name="zip_file" type="data" format="no_unzip.zip,zip" label="Zip file" />
46 </when>
47 <when value="lib">
48 <param name="library" type="text" size="40" label="Library directory name" help="The name of your directory containing all your data" >
49 <validator type="empty_field"/>
50 </param>
51 </when>
52 </conditional>
53
54 <param name="ppmN" label="Maximum deviation between centroids during band detection (in ppm)" type="text" value = "5" help="[ppm]" />
55 <param name="ppmGroupN" label="Accuracy of the mass spectrometer to be used during feature alignment (in ppm)" type="text" value = "5" help="[ppmGroup] Should be inferior or equal to the deviation parameter above." />
56 <param name="fracGroupN" label=" Minimum fraction of samples in which a peak should be detected in at least one class to be kept during feature alignment" type="text" value = "0.5" help="[fracGroup]" />
57 <param name="kI" label="Number of neighbour features to be used for imputation (select 0 to skip the imputation step)" type="text" value = "5" help="[k]" />
58 </inputs>
59
60 <outputs>
61 <data name="dataMatrix_out" label="${tool.name}_dataMatrix.tsv" format="tabular" ></data>
62 <data name="sampleMetadata_out" label="${tool.name}_sampleMetadata.tsv" format="tabular" ></data>
63 <data name="variableMetadata_out" label="${tool.name}_variableMetadata.tsv" format="tabular" ></data>
64 <data name="figure" label="${tool.name}_figure.pdf" format="pdf"/>
65 <data name="information" label="${tool.name}_information.txt" format="txt"/>
66 </outputs>
67
68 <tests>
69 <test>
70 <param name="inputs|input" value="zip_file" />
71 <param name="inputs|zip_file" value="input-plasFIA.zip" ftype="zip" />
72 <param name="ppmN" value="2"/>
73 <param name="ppmGroupN" value="1"/>
74 <param name="fracGroupN" value="0.1"/>
75 <param name="kI" value="2"/>
76 <output name="dataMatrix_out" file="output-dataMatrix.tsv"/>
77 </test>
78 </tests>
79
80 <help>
81
82 .. class:: infomark
83
84 **Author** Alexis Delabriere and Etienne Thevenot (CEA, LIST, MetaboHUB Paris, etienne.thevenot@cea.fr)
85
86 ---------------------------------------------------
87
88 .. class:: infomark
89
90 **Please cite**
91
92 Delabriere A., Hohenester U., Junot C. and Thevenot E.A. *proFIA*: A data preprocessing workflow for Flow Injection Analysis coupled to High-Resolution Mass Spectrometry. *submitted*.
93
94 ---------------------------------------------------
95
96 .. class:: infomark
97
98 **R package**
99
100 The **proFIA** package is available from the bioconductor repository `http://bioconductor.org/packages/proFIA &lt;http://bioconductor.org/packages/proFIA&gt;`_
101
102 ---------------------------------------------------
103
104 .. class:: infomark
105
106 **Tool updates**
107
108 See the **NEWS** section at the bottom of this page
109
110 ---------------------------------------------------
111
112 ==========================================================
113 *proFIA*: Preprocessing workflow for FIA-HRMS data
114 ==========================================================
115
116 -----------
117 Description
118 -----------
119
120 **Flow Injection Analysis coupled to High-Resolution Mass Spectrometry (FIA-HRMS)** is a promising approach for **high-throughput metabolomics** (Madalinski *et al.*, 2008; Fuhrer *et al.*, 2011; Draper *et al.*, 2013). FIA- HRMS data, however, cannot be preprocessed with current software tools which rely on liquid chromatography separation, or handle low resolution data only.
121
122 The **proFIA module is a workflow** allowing to preprocess FIA-HRMS raw data in **centroid** mode and open format (netCDF, mzData, mzXML, and mzML), and generates the table of peak intensities (**peak table**). The workflow consists in **peak detection and quantification** within individual sample files, followed by **alignment** between files in the m/z dimension, and **imputation** of the missing values in the final peak table (Delabriere *et al.*, submitted). For each ion, the graph representing the intensity as a function of time is called a **flowgram**. A flowgram can be modeled as I = kP + ME(P) + B + e, where k is the response factor (corresponding to the ionization properties of the analyte), P is the **sample peak** (normalized profile which is common for all analytes from a sample and depends on the flow injection conditions only), ME is the **matrix effect**, B is the **solvent baseline**, and e is the heteroscedastic noise.
123
124 The generated peak table is available in the '3 table' W4M tabular format (**dataMatrix**, **sampleMetadata**, and **variableMetadata**) for downstream statistical analysis and annotation with W4M modules.
125
126 A figure provides **diagnostics** and visualization of the preprocessed data set.
127
128 ---------------------------------------------------
129
130 .. class:: infomark
131
132 **References**
133
134 | Delabriere A., Hohenester U., Junot C. and Thevenot E.A. proFIA: A data preprocessing workflow for Flow Injection Analysis coupled to High-Resolution Mass Spectrometry. *submitted*.
135 | Draper J., Lloyd A., Goodacre R. and Beckmann M. (2013). Flow infusion electrospray ionisation mass spectrometry for high throughput, non-targeted metabolite fingerprinting: a review. *Metabolomics* 9, 4-29.
136 | Fuhrer T., Dominik H., Boris B. and Zamboni N. (2011). High-throughput, accurate mass metabolome profiling of cellular extracts by flow injection-time-of-flight mass spectrometry. *Analytical Chemistry* 83, 7074-7080.
137 | Madalinski G., Godat E., Alves S., Lesage D., Genin E., Levi P., Labarre J., Tabet J., Ezan E. and Junot, C. (2008). Direct introduction of biological samples into a LTQ-orbitrap hybrid mass spectrometer as a tool for fast metabolome analysis. *Analytical Chemistry* 80, 3291-3303.
138
139 ---------------------------------------------------
140
141 -----------------
142 Workflow position
143 -----------------
144
145 .. image:: profia_workflowPositionImage.png
146 :width: 600
147
148 -----------
149 Input files
150 -----------
151
152 +---------------------------+------------+
153 | Parameter : num + label | Format |
154 +===========================+============+
155 | 1 : Choose your inputs | zip |
156 +---------------------------+------------+
157
158
159 You have two methods for your inputs:
160 | Zip file (recommended): You can put a zip file containing your inputs: myinputs.zip (containing all your conditions as sub-directories).
161 | library folder: You must specify the name of your "library" (folder) created within your space project (for example: /projet/externe/institut/login/galaxylibrary/yourlibrary). Your library must contain all your conditions as sub-directories.
162
163 **Steps for creating the zip file**
164
165 **Step1: Creating your directory and hierarchize the subdirectories**
166
167 .. class:: warningmark
168
169 VERY IMPORTANT: If you zip your files under Windows, you must use the **7Zip** software (http://www.7-zip.org/), otherwise your zip will not be well unzipped on the platform W4M (zip corrupted bug).
170 Your zip should contain all your conditions as sub-directories. For example, two conditions (mutant and wild):
171 arabidopsis/wild/01.raw
172 arabidopsis/mutant/01.raw
173
174 **Step2: Creating a zip file**
175 Create your zip file (e.g.: arabidopsis.zip).
176
177 **Step 3 : Uploading it to our Galaxy server**
178 If your zip file is less than 2Gb, you get use the Get Data tool to upload it.
179 Otherwise if your zip file is larger than 2Gb, please refer to the HOWTO on workflow4metabolomics.org (http://application.sb-roscoff.fr/download/w4m/howto/galaxy_upload_up_2Go.pdf).
180 For more informations, don't hesitate to send us an email at supportATworkflow4metabolomics.org).
181
182 **Advices for converting your files for the XCMS input**
183
184 .. class:: warningmark
185
186 VERY IMPORTANT: your data must be in **centroid** mode. In addition, we recommend you to convert your raw files to mzXML.
187
188 We recommend the following parameters:
189
190 Use Filtering: **True**
191 Use Peak Picking: **True**
192 Peak Peaking -Apply to MS Levels: **All Levels (1-)** : Centroid Mode
193 Use zlib: **64**
194 Binary Encoding: **64**
195 m/z Encoding: **64**
196 Intensity Encoding: **64**
197
198 ----------
199 Parameters
200 ----------
201
202 Maximum deviation between centroids during band detection; in ppm (default = 5)
203 | m/z tolerance of centroids corresponding to the same ion from one scan to the other.
204 |
205
206 Accuracy of the mass spectrometer to be used during feature alignment; in ppm (default = 5)
207 | Should be inferior or equal to the deviation parameter above.
208 |
209
210 Minimum fraction of samples in which a peak should be detected in at least one class to be kept during feature alignment (default = 0.5)
211 | Identical to the corresponding parameter in XCMS.
212 |
213
214 Number of neighbour features to be used for imputation (default = 5)
215 | Select 0 to skip the imputation step.
216 |
217
218
219 ------------
220 Output files
221 ------------
222
223 dataMatrix.tabular
224 | **dataMatrix** tabular separated file with the variables as rows and samples as columns. Missing values are indicated as 'NA' (i.e. when the signal was not significantly different from noise).
225 |
226
227 sampleMetadata.tabular
228 | **sampleMetadata** tabular separated file containing the sample metadata as columns.
229 |
230
231 variableMetadata.tabular
232 | **variableMetadata** tabular separated file containing the variable metadata as columns. The **timeShifted** flag is set to 1 when the flowgram is time shifted compared to the sample peak (probably due to liquid retention in the FI tube). The **corSampPeakMean** metric is the correlation between the feature flowgram and the sample peak (values are in [-1, 1]). A value below 0.2 suggests that the feature signal is affected by a strong matrix effect. The **meanSolvent** is the mean baseline signal in the feature flowgrams. The **signalOverSolventPvalueMean** is the mean p-value of the tests discriminating between signal and baseline solvent.
233 |
234
235 figure.pdf
236 | Visualization and diagnostics about the preprocessed data set; **Feature quality**: Number of detected features per sample for each of the three categories: 'Well-behaved' features have a peak shape close to the sample peak (optimal FIA acquisition is achieved when the majority of the features fall into this category); 'Shifted' indicates a time shift compared to the sample peak, and probably results from retention in the FI tube; 'Significant Matrix Effect' corresponds to a correlation between the feature and the samples peaks of less than 0.2, which is usually caused by a strong matrix effect; **Sample peaks**: Visualization of the peak model for each sample; should have close shapes in case of similar FIA conditions; **m/z density**: may allow to detect a missing m/z value, and in turn, suggest that the *ppm* parameter should be modified; **PCA score plot** of the log10 intensities to detect sample outliers.
237 |
238
239 information.txt
240 | Text file with all messages and warnings generated during the computation.
241 |
242
243 ---------------------------------------------------
244
245 ---------------
246 Working example
247 ---------------
248
249 Figure output
250 =============
251
252 .. image:: profia_workingExampleImage.png
253 :width: 600
254
255 ---------------------------------------------------
256
257 ----
258 NEWS
259 ----
260
261 CHANGES IN VERSION 3.0.0
262 ========================
263
264 NEW FEATURE
265
266 Creation of the tool
267
268 </help>
269
270 <citations>
271 <citation type="bibtex">@Article{DelabriereSubmitted,
272 Title = {proFIA: A data preprocessing workflow for Flow Injection Analysis coupled to High-Resolution Mass Spectrometry},
273 Author = {Delabriere, Alexis and Hohenester, Ulli and Junot, Christophe and Thevenot, Etienne A},
274 Journal = {submitted},
275 Year = {submitted},
276 Pages = {--},
277 Volume = {},
278 Doi = {}
279 }</citation>
280 <citation type="doi">10.1093/bioinformatics/btu813</citation>
281 </citations>
282
283 </tool>