0
|
1 <tool id="hgv_pass" name="PASS" version="1.0.0">
|
|
2 <description>significant transcription factor binding sites from ChIP data</description>
|
|
3
|
|
4 <command interpreter="bash">
|
|
5 pass_wrapper.sh "$input" "$min_window" "$max_window" "$false_num" "$output"
|
|
6 </command>
|
|
7
|
|
8 <inputs>
|
|
9 <param format="gff" name="input" type="data" label="Dataset"/>
|
|
10 <param name="min_window" label="Smallest window size (by # of probes)" type="integer" value="2" />
|
|
11 <param name="max_window" label="Largest window size (by # of probes)" type="integer" value="6" />
|
|
12 <param name="false_num" label="Expected total number of false positive intervals to be called" type="float" value="5.0" help="N.B.: this is a <em>count</em>, not a rate." />
|
|
13 </inputs>
|
|
14
|
|
15 <outputs>
|
|
16 <data format="tabular" name="output" />
|
|
17 </outputs>
|
|
18
|
|
19 <requirements>
|
|
20 <requirement type="package">pass</requirement>
|
|
21 <requirement type="binary">sed</requirement>
|
|
22 </requirements>
|
|
23
|
|
24 <!-- we need to be able to set the seed for the random number generator
|
|
25 <tests>
|
|
26 <test>
|
|
27 <param name="input" ftype="gff" value="pass_input.gff"/>
|
|
28 <param name="min_window" value="2"/>
|
|
29 <param name="max_window" value="6"/>
|
|
30 <param name="false_num" value="5"/>
|
|
31 <output name="output" file="pass_output.tab"/>
|
|
32 </test>
|
|
33 </tests>
|
|
34 -->
|
|
35
|
|
36 <help>
|
|
37 **Dataset formats**
|
|
38
|
|
39 The input is in GFF_ format, and the output is tabular_.
|
|
40 (`Dataset missing?`_)
|
|
41
|
|
42 .. _GFF: ./static/formatHelp.html#gff
|
|
43 .. _tabular: ./static/formatHelp.html#tab
|
|
44 .. _Dataset missing?: ./static/formatHelp.html
|
|
45
|
|
46 -----
|
|
47
|
|
48 **What it does**
|
|
49
|
|
50 PASS (Poisson Approximation for Statistical Significance) detects
|
|
51 significant transcription factor binding sites in the genome from
|
|
52 ChIP data. This is probably the only peak-calling method that
|
|
53 accurately controls the false-positive rate and FDR in ChIP data,
|
|
54 which is important given the huge discrepancy in results obtained
|
|
55 from different peak-calling algorithms. At the same time, this
|
|
56 method achieves a similar or better power than previous methods.
|
|
57
|
|
58 <!-- we don't have wrapper support for the "prior" file yet
|
|
59 Another unique feature of this method is that it allows varying
|
|
60 thresholds to be used for peak calling at different genomic
|
|
61 locations. For example, if a position lies in an open chromatin
|
|
62 region, is depleted of nucleosome positioning, or a co-binding
|
|
63 protein has been detected within the neighborhood, then the position
|
|
64 is more likely to be bound by the target protein of interest, and
|
|
65 hence a lower threshold will be used to call significant peaks.
|
|
66 As a result, weak but real binding sites can be detected.
|
|
67 -->
|
|
68
|
|
69 -----
|
|
70
|
|
71 **Hints**
|
|
72
|
|
73 - ChIP-Seq data:
|
|
74
|
|
75 If the data is from ChIP-Seq, you need to convert the ChIP-Seq values
|
|
76 into z-scores before using this program. It is also recommended that
|
|
77 you group read counts within a neighborhood together, e.g. in tiled
|
|
78 windows of 30bp. In this way, the ChIP-Seq data will resemble
|
|
79 ChIP-chip data in format.
|
|
80
|
|
81 - Choosing window size options:
|
|
82
|
|
83 The window size is related to the probe tiling density. For example,
|
|
84 if the probes are tiled at every 100bp, then setting the smallest
|
|
85 window = 2 and largest window = 6 is appropriate, because the DNA
|
|
86 fragment size is around 300-500bp.
|
|
87
|
|
88 -----
|
|
89
|
|
90 **Example**
|
|
91
|
|
92 - input file::
|
|
93
|
|
94 chr7 Nimblegen ID 40307603 40307652 1.668944 . . .
|
|
95 chr7 Nimblegen ID 40307703 40307752 0.8041307 . . .
|
|
96 chr7 Nimblegen ID 40307808 40307865 -1.089931 . . .
|
|
97 chr7 Nimblegen ID 40307920 40307969 1.055044 . . .
|
|
98 chr7 Nimblegen ID 40308005 40308068 2.447853 . . .
|
|
99 chr7 Nimblegen ID 40308125 40308174 0.1638694 . . .
|
|
100 chr7 Nimblegen ID 40308223 40308275 -0.04796628 . . .
|
|
101 chr7 Nimblegen ID 40308318 40308367 0.9335709 . . .
|
|
102 chr7 Nimblegen ID 40308526 40308584 0.5143972 . . .
|
|
103 chr7 Nimblegen ID 40308611 40308660 -1.089931 . . .
|
|
104 etc.
|
|
105
|
|
106 In GFF, a value of dot '.' is used to mean "not applicable".
|
|
107
|
|
108 - output file::
|
|
109
|
|
110 ID Chr Start End WinSz PeakValue # of FPs FDR
|
|
111 1 chr7 40310931 40311266 4 1.663446 0.248817 0.248817
|
|
112
|
|
113 -----
|
|
114
|
|
115 **References**
|
|
116
|
|
117 Zhang Y. (2008)
|
|
118 Poisson approximation for significance in genome-wide ChIP-chip tiling arrays.
|
|
119 Bioinformatics. 24(24):2825-31. Epub 2008 Oct 25.
|
|
120
|
|
121 Chen KB, Zhang Y. (2010)
|
|
122 A varying threshold method for ChIP peak calling using multiple sources of information.
|
|
123 Submitted.
|
|
124
|
|
125 </help>
|
|
126 </tool>
|