view ACF/analytic_correlation_filtration.xml @ 2:15430e89c97d draft default tip

Uploaded
author melpetera
date Thu, 07 Nov 2019 03:42:14 -0500
parents
children
line wrap: on
line source

<tool id="Analytic_correlation_filtration" name="Analytic correlation filtration" version="2019-06-20">
	<description>
		: Detect analytic correlation among data and remove them.
	</description>
	
	
	 <command><![CDATA[
		
		
		perl $__tool_directory__/Analytic_correlation_filtration.pl
		
		
		#if str($mass_file.mass_choice)=="false":
			#if str($rt_cond.rt_choice)=="false":
				perl $__tool_directory__/Analytic_correlation_filtration.pl -f "$file_in" -o 1 -d "$dataMatrix_in" -v "$variableMetadata_in" -rt 9999999999
			#else:
				perl $__tool_directory__/Analytic_correlation_filtration.pl -f "$file_in" -o 1 -d "$dataMatrix_in" -v "$variableMetadata_in"  -rt "$rt_cond.rt_threshold" 
			#end if
		#else:
			#if str($mass_file.liste.mass_list)=="true":
				#if str($rt_cond.rt_choice)=="true":
					perl $__tool_directory__/Analytic_correlation_filtration.pl -f "$file_in" -m "$mass_file.liste.mass_file_in" -o 2 -d "$dataMatrix_in" -v "$variableMetadata_in"  -rt "$rt_cond.rt_threshold" -mass "$mass_file.mass_threshold" 
				#end if
				#if str($rt_cond.rt_choice)=="false":
					perl $__tool_directory__/Analytic_correlation_filtration.pl -f "$file_in" -m "$mass_file.liste.mass_file_in" -o 3 -d "$dataMatrix_in" -v "$variableMetadata_in"  -mass "$mass_file.mass_threshold" 
				#end if
			#else
					#if str($rt_cond.rt_choice)=="true":
						perl $__tool_directory__/Analytic_correlation_filtration.pl -f "$file_in" -m $__tool_directory__/data/default_list.csv -o 2 -d "$dataMatrix_in" -v "$variableMetadata_in" -rt "$rt_cond.rt_threshold" -mass "$mass_file.mass_threshold" 
					#end if
					#if str($rt_cond.rt_choice)=="false":
						perl $__tool_directory__/Analytic_correlation_filtration.pl -f "$file_in" -m $__tool_directory__/data/default_list.csv -o 3 -d "$dataMatrix_in" -v "$variableMetadata_in" -mass "$mass_file.mass_threshold" 
					#end if
			#end if
		#end if
	
		-r "$repres_opt.repres_opt_selector"
		
		#if str($repres_opt.repres_opt_selector)=="max_intensity_max_mass":
			-IT $repres_opt.int_threshold
			-IP $repres_opt.int_percentage
		#end if
		-correl "$correl_threshold"
		-output_sif "$sif_out"
		-output_tabular "$variableMetadata_out"
		
	]]></command>
	
	<inputs>
		<param type="data" name="file_in" format="txt" help="The .txt similarity table (you can obtain it by using the Between-table Correlation tool or for exemple the cor() function in R) " label="Correlation table file" />
		<param type="data" name="dataMatrix_in" format="tabular" help="" label="dataMatrix file" />
		<param type="data" name="variableMetadata_in" format="tabular" help="" label="variableMetadata file" />
		
		<param help="Define the minimum similarity threshold accepted to determine analytic correlation" label="Correlation threshold" type="float" name="correl_threshold" value="0.90"/>
		
		<conditional name="mass_file">
		  <param name="mass_choice" checked="true" falsevalue="false" help="'YES' if you want to take it into account; 'NO' if you don't want to take into account mass information" label="Do you want to take into account mass differences between 2 ions?" truevalue="true" type="boolean"/>
				<when value="true">
					<conditional name="liste">
						<param name="mass_list" checked="true" falsevalue="false" help="'YES' if you have your own list to upload; 'NO' if you want to use a default list" label="Do you have your own list of mass differences or do you want to use a default list ?" truevalue="true" type="boolean"/>
						<when value="false">
						
						</when>
						<when value="true">
							<param type="data" name="mass_file_in" format="tabular,csv" help="The file containing all your report and known mass differences (cf help for file example) " label="Mass differences table (format: tabular or csv) " />
						</when>
					</conditional>
					<param help="2 ions need to have a difference mass included in the list at +/- mass difference range to be considered as analytically correlated | Value recommendation : 0.005" label="Mass difference range" type="float" name="mass_threshold" value="0.005"/>
				</when>
				<when value="false">
			
				</when>
		</conditional>
		
		<conditional name="rt_cond">
			<param checked="true" falsevalue="false" help="'YES' if want to take into account retention time information; 'NO' if you don't want to take into account retention time information" label="Do you want to take into account retention time differences between 2 ions? " name="rt_choice" truevalue="true" type="boolean"/>
				<when value="true">
					<param help="Choose a retention time difference threshold between 2 ions considered as analytically correlated | Value recommendation : 0.1" label="Retention time difference threshold" type="float" name="rt_threshold" value="0.1"/>
				</when>
				<when value="false">
					
				</when>
		</conditional>
		
		<conditional name="repres_opt">
			<param name="repres_opt_selector" label="Which representative ion do you want to select for each group" type="select" display="radio" help="">
				<option value="intensity">Highest intensity</option>
				<option value="mass">Highest mass</option>
				<option value="mixt">Highest (mass2 x intensity) </option>
				<option value="max_intensity_max_mass">Highest mass between the 3 highest intensity (following intensity threshold and rules ==> see help) </option>
			</param>
			<when value="max_intensity_max_mass">
				<param help="" label="Minimum intensity threshold for the representative ion" type="float" name="int_threshold" value="1000"/>
				<param help="Example: ion A have the highest intensity of a group but not the highest mass, B is an ion that have the second highest intensity in the group and a highest mass than A, to choose B as a representative ion for the group his intensity need to be at list 50% of the A intensity." label="Percentage of highest intensity of the group accept for the new representative ion. This option allow to avoid isotope selection. " type="float" name="int_percentage" value="0.5"/>
			</when>
			<when value="intensity">
			</when>
			<when value="mass">
			</when>
			<when value="mixt">
			</when>
		</conditional>
		
	</inputs>
	
	<outputs>
		<data format="sif" label="${file_in.name}_sif" name="sif_out"/>
		<data format="tabular" label="${variableMetadata_in.name}_representative_ion" name="variableMetadata_out"/>
	</outputs>
	
	<help><![CDATA[
	
.. class:: infomark

**Contact** : **Stephanie Monnerie**, **Estelle Pujos-Guillot**

---------------------------------------------------

.. class:: infomark

**References** : S. Monnerie, M. Petera, B. Lyan, P. Gaudreau, B. Comte and E. Pujos-Guillot (2019) Analytic Correlation Filtration: a new tool to reduce analytical complexity of metabolomics datasets. Metabolites

---------------------------------------------------
	
-----------
Input files
-----------

+-----------------------------------------+---------------+
| File                                    |     Format    |
+=========================================+===============+
| 1)  Similarity matrix                  |  txt          |
+-----------------------------------------+---------------+
| 2)  Data matrix                         |  tabular      |
+-----------------------------------------+---------------+
| 3)  Variable metadata                   |  tabular      |
+-----------------------------------------+---------------+
| **Optional file**                       |   **Format**  |
+-----------------------------------------+---------------+
| 4)  Optional : Mass differences list    |  csv/tabular  |
+-----------------------------------------+---------------+

---------------------------------------------------

-------------
Files content
-------------
	
Similarity matrix
	* File organisation : on line by similarity pairs with the first ion ID, the similarity value and the second ion ID, tabular separated ==> Fist_Ion_ID \\t Similarity_Value \\t Second_Ion_ID
	* Example:
	
.. image:: similarity_matrix.JPG
	:width: 800
	
Data matrix file
	* "variable x sample" **dataMatrix** : tabular separated file of the numeric data matrix, with . as decimal, and NA for missing values; the table must not contain metadata apart from row and column names; the row and column names must be identical to the rownames of the variable metadata (see below)

Variable metadata file
	* "variable x metadata" **variableMetadata** tabular separated file of the numeric and/or character variable metadata, with . as decimal and NA for missing values

.. class:: warningmark

For more information about input files, refer to the corresponding "W4M HowTo" page:
http://workflow4metabolomics.org/sites/workflow4metabolomics.org/files/files/w4m_TableFormatForGalaxy_150908.pdf


Mass differences list
	* A file containing list of known adducts, fragments or isotopes with the mass differences linked to them
	* Example:

.. image:: Adduct_fragment_list.JPG
	:width: 350

---------------------------------------------------
	
----------
Parameters
----------

Take into account mass diffrences between 2 ions :
	* You can enter a list of mass differences that are known. The file must be organized with a first column for the mass difference type (isotope, fragment, etc...), a second column with the mass difference chemical formula (H+, -2H+K, etc...) and a third column for the mass difference value
	* If you are choosing to use a mass differences table, you have to choose a mass difference range that will be a threshold to accept or not a difference value as true (recognize a mass difference value in the file +/- this threshold).

Take into acount retention time :
	* You can use retention time as a criteria to group ions. You have to choose a value that will be use as intervalle : 2 ions are group when their retention time is equal +/- the threshold.

Choose the representative ion for each group, there are 3 possibilities to determine the representative ion :
	* The ion with the highest intensity (recommandated for LC/MS)
	* The ion with the highest mass
	* The ion with the highest "mass2 * intensity" value 
	* The ion with the highest mass between the 3 highest intensity of the group, except if the highest mass ion have an intensity < determined percentage of the highest intensity ion one (for exemple 50%) (recommandated for GC/MS)
	

---------------------------------------------------
	
--------------
Example of use
--------------

For UPLC/HRMS data, default parameters can be the following:
	* If a Pearson correlation is used, the default threshold can be set at 0.90
	* A delta RT of 0.1 min or adjusted depending on chromatographic systems
	* The use of the list of known adduct/isotope mass differences with a mass delta of 0.005 Da or adjusted depending on MS resolution
	* The choice of the ion with the highest intensity as the representative ion.
For GC/HRMS dataset, we recommend to use the same parameters but ignoring the list of mass difference and to choose the ion with the highest mass among the top highest intensity as representative.


	]]></help>
</tool>