comparison carpet-src-1/tools/CARPET/PeakPeaker.xml @ 0:cdd489d98766

Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
author matces
date Tue, 07 Jun 2011 16:50:41 -0400
parents
children
comparison
equal deleted inserted replaced
-1:000000000000 0:cdd489d98766
1 <tool id="Find peaks" name="PeakPicker" version="1.0.0">
2 <description>Finding Peaks in a GFF Nimblegen File</description>
3 <command interpreter="perl">PeakPeaker2.pl --in $input --out $output --t $type --dist_peaks $dist_peaks --col3 $col3 --log $log --perc $perc --num $num --dist $dist --w $window --f_pv $output2</command>
4 <inputs>
5 <param format="tabular" name="input" type="data" label="Source file"/>
6 <param name="col3" size="20" type="text" value="Analisys" label="Analisys name"/>
7 <param name="type" type="select" label="Analysis type">
8 <option value="p">p-value</option>
9 <option value="s">score</option>
10 </param>
11 <param name="perc" size="4" type="text" value="0.95" label="percentile value"/>
12 <param name="log" size="2" type="text" value="7" label="-log p-value cutoff"/>
13 <param name="num" size="2" type="text" value="3" label="minimal number of probes"/>
14 <param name="dist" size="4" type="text" value="100" label="max distance between two probes"/>
15 <param name="dist_peaks" size="4" type="text" value="200" label="min distance between two peaks"/>
16 <param name="window" size="4" type="text" value="500" label="window length"/>
17 </inputs>
18 <outputs>
19 <data format="bed" name="output" />
20 <data format="gff" name="output2" />
21 </outputs>
22
23
24 <help>
25 .. class:: infomark
26
27 **What it does**
28
29 This tool utilizes NimbleGen ratio files in gff format as INPUT FILE and provides a table of the computed peaks in the same gff format.
30
31 --------
32
33 **Parameters:**
34
35 - **Analysis type:**
36 - **p-value** analysis performs peaks determination based on p-value inference
37 - **score** analysis performs peaks determination based on a scoring system
38 - **Percentile value:** it is used to calculate the threshold rate based on dataset distribution to filter out background
39 - **-log p-value cutoff:** (required only for p-value based analysis) cutoff integer to be used to identify a significant peak
40 - **minimal # of probes:** minimal number of consecutive probes used to define a peak
41 - **max distance 2 probes:** greatest nucleotide distance (bp) between two probes that allow to consider two probes as adjacent
42 - **min distance 2 peaks:** minimum nucleotide distance (bp) required to consider two peaks as separate entities
43 - **window length:** length in bp of the window used for statistical analysis
44
45 --------
46
47
48 **INPUT FILE**
49
50 Nimblegen gives you back a GFF file with the coordinates of each probe and the normalized signal value --> log2(Cy5/Cy3).
51
52 Click here_ to download a GFF file example.
53
54 .. _here: /static/example_file/GFF_file_norm.txt.zip
55
56 Example of Nimblegen GFF format::
57
58 chr19 Nimblegen tiling_array 100000 1000051 -1.2 + . probe_name
59 chr19 Nimblegen tiling_array 100100 1000151 2.9 + . probe_name
60
61 .. class:: warningmark
62
63 The sixth column **must** contain the normalized log2(cy5/cy3) that Nimblegen gives you back after the experiment
64
65
66 ---------
67
68 .. class:: infomark
69
70 **How does it work?**
71
72 **Two assumptions:**
73
74
75 - data are enriched for signal in the positive direction ("one-tailed")
76 - a peak (or enriched region) is represented by multiple probes that are genomically located close to each other
77
78
79 **Statistical approach: sliding window**
80
81
82 A window centered at each probe of the array moves probe by probe. In each window Chi squared is calculated
83
84
85 .. image:: static/images/CARPET/chi_squared.png
86
87
88 by building a contingency table for each probe, and a p-value is assigned
89
90
91 .. image:: static/images/CARPET/centered.png
92
93
94 **"-log2(p-value)"** is associated to each probe. This value takes in account the neighbouring probes effect.
95 This approach dramatically decreases the background signal.
96
97
98 .. image:: static/images/CARPET/background.png
99
100
101 New values are considered to defined an enriched locus
102
103
104 .. image:: static/images/CARPET/pvalue.png
105
106
107 Moreover a score is calculated taking into account the length and the raw signal of the peak
108
109
110 .. image:: static/images/CARPET/pvalue_score.png
111
112
113 Output is a gff file
114
115
116 .. image:: static/images/CARPET/table_pv.png
117
118
119 **NON Statistical approach: score**
120
121
122 Only the raw signal of each probe is considered. Only the regions with a number of consecutive probes above the defined threshold are selected
123
124
125 .. image:: static/images/CARPET/score.png
126
127
128 Output is a GFF file
129
130
131 .. image:: static/images/CARPET/table_score.png
132
133
134 and a GFF file with the p-values associate to each probe
135
136 </help>
137
138 </tool>
139