Mercurial > repos > matces > carpet_toolsuite
view carpet-src-1/tools/CARPET/PeakPeaker.xml @ 1:78770028dcf1 default tip
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
author | matces |
---|---|
date | Tue, 07 Jun 2011 16:59:33 -0400 |
parents | cdd489d98766 |
children |
line wrap: on
line source
<tool id="Find peaks" name="PeakPicker" version="1.0.0"> <description>Finding Peaks in a GFF Nimblegen File</description> <command interpreter="perl">PeakPeaker2.pl --in $input --out $output --t $type --dist_peaks $dist_peaks --col3 $col3 --log $log --perc $perc --num $num --dist $dist --w $window --f_pv $output2</command> <inputs> <param format="tabular" name="input" type="data" label="Source file"/> <param name="col3" size="20" type="text" value="Analisys" label="Analisys name"/> <param name="type" type="select" label="Analysis type"> <option value="p">p-value</option> <option value="s">score</option> </param> <param name="perc" size="4" type="text" value="0.95" label="percentile value"/> <param name="log" size="2" type="text" value="7" label="-log p-value cutoff"/> <param name="num" size="2" type="text" value="3" label="minimal number of probes"/> <param name="dist" size="4" type="text" value="100" label="max distance between two probes"/> <param name="dist_peaks" size="4" type="text" value="200" label="min distance between two peaks"/> <param name="window" size="4" type="text" value="500" label="window length"/> </inputs> <outputs> <data format="bed" name="output" /> <data format="gff" name="output2" /> </outputs> <help> .. class:: infomark **What it does** This tool utilizes NimbleGen ratio files in gff format as INPUT FILE and provides a table of the computed peaks in the same gff format. -------- **Parameters:** - **Analysis type:** - **p-value** analysis performs peaks determination based on p-value inference - **score** analysis performs peaks determination based on a scoring system - **Percentile value:** it is used to calculate the threshold rate based on dataset distribution to filter out background - **-log p-value cutoff:** (required only for p-value based analysis) cutoff integer to be used to identify a significant peak - **minimal # of probes:** minimal number of consecutive probes used to define a peak - **max distance 2 probes:** greatest nucleotide distance (bp) between two probes that allow to consider two probes as adjacent - **min distance 2 peaks:** minimum nucleotide distance (bp) required to consider two peaks as separate entities - **window length:** length in bp of the window used for statistical analysis -------- **INPUT FILE** Nimblegen gives you back a GFF file with the coordinates of each probe and the normalized signal value --> log2(Cy5/Cy3). Click here_ to download a GFF file example. .. _here: /static/example_file/GFF_file_norm.txt.zip Example of Nimblegen GFF format:: chr19 Nimblegen tiling_array 100000 1000051 -1.2 + . probe_name chr19 Nimblegen tiling_array 100100 1000151 2.9 + . probe_name .. class:: warningmark The sixth column **must** contain the normalized log2(cy5/cy3) that Nimblegen gives you back after the experiment --------- .. class:: infomark **How does it work?** **Two assumptions:** - data are enriched for signal in the positive direction ("one-tailed") - a peak (or enriched region) is represented by multiple probes that are genomically located close to each other **Statistical approach: sliding window** A window centered at each probe of the array moves probe by probe. In each window Chi squared is calculated .. image:: static/images/CARPET/chi_squared.png by building a contingency table for each probe, and a p-value is assigned .. image:: static/images/CARPET/centered.png **"-log2(p-value)"** is associated to each probe. This value takes in account the neighbouring probes effect. This approach dramatically decreases the background signal. .. image:: static/images/CARPET/background.png New values are considered to defined an enriched locus .. image:: static/images/CARPET/pvalue.png Moreover a score is calculated taking into account the length and the raw signal of the peak .. image:: static/images/CARPET/pvalue_score.png Output is a gff file .. image:: static/images/CARPET/table_pv.png **NON Statistical approach: score** Only the raw signal of each probe is considered. Only the regions with a number of consecutive probes above the defined threshold are selected .. image:: static/images/CARPET/score.png Output is a GFF file .. image:: static/images/CARPET/table_score.png and a GFF file with the p-values associate to each probe </help> </tool>