Mercurial > repos > sblanck > mpagenomics

<tool id="markersSelection" name="Markers selection" version="1.3.0">
   <requirements>
        <container type="docker">sblanck/mpagenomicsdependencies</container>
  </requirements>
  <description> of previously extracted signal</description>
  <command>
  	<![CDATA[
        Rscript
        ${__tool_directory__}/selectionExtracted.R
    --input '$input'
  	--response '$response'
  	--new_file_path '$output.extra_files_path'
  	--folds '$folds'
  	--loss '$loss'
  	--outputlog '$outputlog'
  	--output '$output'
  	--log '$log'
  	]]>
  </command>
  <inputs>
    <param name="input" type="data" format="sef,saf" label="Input Signal" help="see below for more information on file format"/>
	<param name="response" type="data" format="csv" label="Data response" help="Data response csv file. See below for more information on file format" />
	<param name="folds" type="integer" min="1" value="10" label ="Number of folds for cross validation" help="Integer between 1 and number of file in the .cel file dataset"/>
    <param name="loss" type="select" multiple="false" label="Response type">
      	<option value="linear">Linear</option>
    	<option value="logistic">Logistic</option>
    </param>

	<!--param name="outputgraph" type="select" label="Output figures">
        <option value="TRUE">Yes</option>
        <option value="FALSE">No</option>
     </param-->
    <param name="outputlog" type="select" label="Output log">
        <option value="TRUE">Yes</option>
        <option value="FALSE">No</option>
    </param>
  </inputs>
  <outputs>
    <data format="tabular" name="output" label="selection of ${input.name}" />
    <data format="log" name="log" label="log of selection of ${input.name}" >
    	<filter>outputlog == "TRUE"</filter>
    </data>
  </outputs>
  <stdio>
    <exit_code range="1:"   level="fatal"   description="See logs for more details" />
   </stdio>
  <help>
  **What it does**

This tool selects some relevant markers according to a response using penalized regressions.

Input:

*A tabular text file containing 3 fixed columns and 1 column per sample:*

	- chr: Chromosome.
	- position: Genomic position (in bp).
  	- probeNames: Names of the probes.
  	- One column per sample which contain the copy number signal for each sample.

Output:

*A tabular text file containing 5 columns which describe all the selected SNPs (1 line per SNP):*

	- chr: Chromosome containing the selected SNP.
  	- position: Position of the selected SNP.
	- index: Index of the selected SNP.
	- names: Name of the selected SNP.
	- coefficient: Regression coefficient of the selected SNP.

-----

**Data Response csv file**

Data response csv file format:

	- The first column contains the names of the different files of the dataset.

	- The second column is the response associated with each file.

	- Column names of these two columns are respectively files and response.

	- Columns are separated by a comma

	- *Extensions of the files (.CEL for example) should be removed*


**Example**

Let 3 .cel files in the studied dataset ::

     	patient1.cel
     	patient2.cel
     	patient3.cel

The csv file should look like this ::

     	files,response
     	patient1,1.92145
     	patient2,2.12481
     	patient3,1.23545


-----

**Citation**

If you use this tool please cite :

`Q. Grimonprez, A. Celisse, M. Cheok, M. Figeac, and G. Marot. MPAgenomics : An R package for multi-patients analysis of genomic markers, 2014. Preprint &lt;http://fr.arxiv.org/abs/1401.5035&gt;`_

 </help>
</tool>
author	sblanck
date	Tue, 20 Apr 2021 15:00:42 +0000
parents	4f753bb8681e
children