Mercurial > repos > matces > carpet_toolsuite
comparison carpet-src-1/tools/CARPET/PeakPeaker.xml @ 0:cdd489d98766
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
author | matces |
---|---|
date | Tue, 07 Jun 2011 16:50:41 -0400 |
parents | |
children |
comparison
equal
deleted
inserted
replaced
-1:000000000000 | 0:cdd489d98766 |
---|---|
1 <tool id="Find peaks" name="PeakPicker" version="1.0.0"> | |
2 <description>Finding Peaks in a GFF Nimblegen File</description> | |
3 <command interpreter="perl">PeakPeaker2.pl --in $input --out $output --t $type --dist_peaks $dist_peaks --col3 $col3 --log $log --perc $perc --num $num --dist $dist --w $window --f_pv $output2</command> | |
4 <inputs> | |
5 <param format="tabular" name="input" type="data" label="Source file"/> | |
6 <param name="col3" size="20" type="text" value="Analisys" label="Analisys name"/> | |
7 <param name="type" type="select" label="Analysis type"> | |
8 <option value="p">p-value</option> | |
9 <option value="s">score</option> | |
10 </param> | |
11 <param name="perc" size="4" type="text" value="0.95" label="percentile value"/> | |
12 <param name="log" size="2" type="text" value="7" label="-log p-value cutoff"/> | |
13 <param name="num" size="2" type="text" value="3" label="minimal number of probes"/> | |
14 <param name="dist" size="4" type="text" value="100" label="max distance between two probes"/> | |
15 <param name="dist_peaks" size="4" type="text" value="200" label="min distance between two peaks"/> | |
16 <param name="window" size="4" type="text" value="500" label="window length"/> | |
17 </inputs> | |
18 <outputs> | |
19 <data format="bed" name="output" /> | |
20 <data format="gff" name="output2" /> | |
21 </outputs> | |
22 | |
23 | |
24 <help> | |
25 .. class:: infomark | |
26 | |
27 **What it does** | |
28 | |
29 This tool utilizes NimbleGen ratio files in gff format as INPUT FILE and provides a table of the computed peaks in the same gff format. | |
30 | |
31 -------- | |
32 | |
33 **Parameters:** | |
34 | |
35 - **Analysis type:** | |
36 - **p-value** analysis performs peaks determination based on p-value inference | |
37 - **score** analysis performs peaks determination based on a scoring system | |
38 - **Percentile value:** it is used to calculate the threshold rate based on dataset distribution to filter out background | |
39 - **-log p-value cutoff:** (required only for p-value based analysis) cutoff integer to be used to identify a significant peak | |
40 - **minimal # of probes:** minimal number of consecutive probes used to define a peak | |
41 - **max distance 2 probes:** greatest nucleotide distance (bp) between two probes that allow to consider two probes as adjacent | |
42 - **min distance 2 peaks:** minimum nucleotide distance (bp) required to consider two peaks as separate entities | |
43 - **window length:** length in bp of the window used for statistical analysis | |
44 | |
45 -------- | |
46 | |
47 | |
48 **INPUT FILE** | |
49 | |
50 Nimblegen gives you back a GFF file with the coordinates of each probe and the normalized signal value --> log2(Cy5/Cy3). | |
51 | |
52 Click here_ to download a GFF file example. | |
53 | |
54 .. _here: /static/example_file/GFF_file_norm.txt.zip | |
55 | |
56 Example of Nimblegen GFF format:: | |
57 | |
58 chr19 Nimblegen tiling_array 100000 1000051 -1.2 + . probe_name | |
59 chr19 Nimblegen tiling_array 100100 1000151 2.9 + . probe_name | |
60 | |
61 .. class:: warningmark | |
62 | |
63 The sixth column **must** contain the normalized log2(cy5/cy3) that Nimblegen gives you back after the experiment | |
64 | |
65 | |
66 --------- | |
67 | |
68 .. class:: infomark | |
69 | |
70 **How does it work?** | |
71 | |
72 **Two assumptions:** | |
73 | |
74 | |
75 - data are enriched for signal in the positive direction ("one-tailed") | |
76 - a peak (or enriched region) is represented by multiple probes that are genomically located close to each other | |
77 | |
78 | |
79 **Statistical approach: sliding window** | |
80 | |
81 | |
82 A window centered at each probe of the array moves probe by probe. In each window Chi squared is calculated | |
83 | |
84 | |
85 .. image:: static/images/CARPET/chi_squared.png | |
86 | |
87 | |
88 by building a contingency table for each probe, and a p-value is assigned | |
89 | |
90 | |
91 .. image:: static/images/CARPET/centered.png | |
92 | |
93 | |
94 **"-log2(p-value)"** is associated to each probe. This value takes in account the neighbouring probes effect. | |
95 This approach dramatically decreases the background signal. | |
96 | |
97 | |
98 .. image:: static/images/CARPET/background.png | |
99 | |
100 | |
101 New values are considered to defined an enriched locus | |
102 | |
103 | |
104 .. image:: static/images/CARPET/pvalue.png | |
105 | |
106 | |
107 Moreover a score is calculated taking into account the length and the raw signal of the peak | |
108 | |
109 | |
110 .. image:: static/images/CARPET/pvalue_score.png | |
111 | |
112 | |
113 Output is a gff file | |
114 | |
115 | |
116 .. image:: static/images/CARPET/table_pv.png | |
117 | |
118 | |
119 **NON Statistical approach: score** | |
120 | |
121 | |
122 Only the raw signal of each probe is considered. Only the regions with a number of consecutive probes above the defined threshold are selected | |
123 | |
124 | |
125 .. image:: static/images/CARPET/score.png | |
126 | |
127 | |
128 Output is a GFF file | |
129 | |
130 | |
131 .. image:: static/images/CARPET/table_score.png | |
132 | |
133 | |
134 and a GFF file with the p-values associate to each probe | |
135 | |
136 </help> | |
137 | |
138 </tool> | |
139 |