Mercurial > repos > xuebing > sharplabtool
comparison tools/human_genome_variation/pass.xml @ 0:9071e359b9a3
Uploaded
author | xuebing |
---|---|
date | Fri, 09 Mar 2012 19:37:19 -0500 |
parents | |
children |
comparison
equal
deleted
inserted
replaced
-1:000000000000 | 0:9071e359b9a3 |
---|---|
1 <tool id="hgv_pass" name="PASS" version="1.0.0"> | |
2 <description>significant transcription factor binding sites from ChIP data</description> | |
3 | |
4 <command interpreter="bash"> | |
5 pass_wrapper.sh "$input" "$min_window" "$max_window" "$false_num" "$output" | |
6 </command> | |
7 | |
8 <inputs> | |
9 <param format="gff" name="input" type="data" label="Dataset"/> | |
10 <param name="min_window" label="Smallest window size (by # of probes)" type="integer" value="2" /> | |
11 <param name="max_window" label="Largest window size (by # of probes)" type="integer" value="6" /> | |
12 <param name="false_num" label="Expected total number of false positive intervals to be called" type="float" value="5.0" help="N.B.: this is a <em>count</em>, not a rate." /> | |
13 </inputs> | |
14 | |
15 <outputs> | |
16 <data format="tabular" name="output" /> | |
17 </outputs> | |
18 | |
19 <requirements> | |
20 <requirement type="package">pass</requirement> | |
21 <requirement type="binary">sed</requirement> | |
22 </requirements> | |
23 | |
24 <!-- we need to be able to set the seed for the random number generator | |
25 <tests> | |
26 <test> | |
27 <param name="input" ftype="gff" value="pass_input.gff"/> | |
28 <param name="min_window" value="2"/> | |
29 <param name="max_window" value="6"/> | |
30 <param name="false_num" value="5"/> | |
31 <output name="output" file="pass_output.tab"/> | |
32 </test> | |
33 </tests> | |
34 --> | |
35 | |
36 <help> | |
37 **Dataset formats** | |
38 | |
39 The input is in GFF_ format, and the output is tabular_. | |
40 (`Dataset missing?`_) | |
41 | |
42 .. _GFF: ./static/formatHelp.html#gff | |
43 .. _tabular: ./static/formatHelp.html#tab | |
44 .. _Dataset missing?: ./static/formatHelp.html | |
45 | |
46 ----- | |
47 | |
48 **What it does** | |
49 | |
50 PASS (Poisson Approximation for Statistical Significance) detects | |
51 significant transcription factor binding sites in the genome from | |
52 ChIP data. This is probably the only peak-calling method that | |
53 accurately controls the false-positive rate and FDR in ChIP data, | |
54 which is important given the huge discrepancy in results obtained | |
55 from different peak-calling algorithms. At the same time, this | |
56 method achieves a similar or better power than previous methods. | |
57 | |
58 <!-- we don't have wrapper support for the "prior" file yet | |
59 Another unique feature of this method is that it allows varying | |
60 thresholds to be used for peak calling at different genomic | |
61 locations. For example, if a position lies in an open chromatin | |
62 region, is depleted of nucleosome positioning, or a co-binding | |
63 protein has been detected within the neighborhood, then the position | |
64 is more likely to be bound by the target protein of interest, and | |
65 hence a lower threshold will be used to call significant peaks. | |
66 As a result, weak but real binding sites can be detected. | |
67 --> | |
68 | |
69 ----- | |
70 | |
71 **Hints** | |
72 | |
73 - ChIP-Seq data: | |
74 | |
75 If the data is from ChIP-Seq, you need to convert the ChIP-Seq values | |
76 into z-scores before using this program. It is also recommended that | |
77 you group read counts within a neighborhood together, e.g. in tiled | |
78 windows of 30bp. In this way, the ChIP-Seq data will resemble | |
79 ChIP-chip data in format. | |
80 | |
81 - Choosing window size options: | |
82 | |
83 The window size is related to the probe tiling density. For example, | |
84 if the probes are tiled at every 100bp, then setting the smallest | |
85 window = 2 and largest window = 6 is appropriate, because the DNA | |
86 fragment size is around 300-500bp. | |
87 | |
88 ----- | |
89 | |
90 **Example** | |
91 | |
92 - input file:: | |
93 | |
94 chr7 Nimblegen ID 40307603 40307652 1.668944 . . . | |
95 chr7 Nimblegen ID 40307703 40307752 0.8041307 . . . | |
96 chr7 Nimblegen ID 40307808 40307865 -1.089931 . . . | |
97 chr7 Nimblegen ID 40307920 40307969 1.055044 . . . | |
98 chr7 Nimblegen ID 40308005 40308068 2.447853 . . . | |
99 chr7 Nimblegen ID 40308125 40308174 0.1638694 . . . | |
100 chr7 Nimblegen ID 40308223 40308275 -0.04796628 . . . | |
101 chr7 Nimblegen ID 40308318 40308367 0.9335709 . . . | |
102 chr7 Nimblegen ID 40308526 40308584 0.5143972 . . . | |
103 chr7 Nimblegen ID 40308611 40308660 -1.089931 . . . | |
104 etc. | |
105 | |
106 In GFF, a value of dot '.' is used to mean "not applicable". | |
107 | |
108 - output file:: | |
109 | |
110 ID Chr Start End WinSz PeakValue # of FPs FDR | |
111 1 chr7 40310931 40311266 4 1.663446 0.248817 0.248817 | |
112 | |
113 ----- | |
114 | |
115 **References** | |
116 | |
117 Zhang Y. (2008) | |
118 Poisson approximation for significance in genome-wide ChIP-chip tiling arrays. | |
119 Bioinformatics. 24(24):2825-31. Epub 2008 Oct 25. | |
120 | |
121 Chen KB, Zhang Y. (2010) | |
122 A varying threshold method for ChIP peak calling using multiple sources of information. | |
123 Submitted. | |
124 | |
125 </help> | |
126 </tool> |