annotate tools/human_genome_variation/pass.xml @ 0:9071e359b9a3

Uploaded
author xuebing
date Fri, 09 Mar 2012 19:37:19 -0500
parents
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
1 <tool id="hgv_pass" name="PASS" version="1.0.0">
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
2 <description>significant transcription factor binding sites from ChIP data</description>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
3
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
4 <command interpreter="bash">
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
5 pass_wrapper.sh "$input" "$min_window" "$max_window" "$false_num" "$output"
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
6 </command>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
7
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
8 <inputs>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
9 <param format="gff" name="input" type="data" label="Dataset"/>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
10 <param name="min_window" label="Smallest window size (by # of probes)" type="integer" value="2" />
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
11 <param name="max_window" label="Largest window size (by # of probes)" type="integer" value="6" />
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
12 <param name="false_num" label="Expected total number of false positive intervals to be called" type="float" value="5.0" help="N.B.: this is a &lt;em&gt;count&lt;/em&gt;, not a rate." />
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
13 </inputs>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
14
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
15 <outputs>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
16 <data format="tabular" name="output" />
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
17 </outputs>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
18
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
19 <requirements>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
20 <requirement type="package">pass</requirement>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
21 <requirement type="binary">sed</requirement>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
22 </requirements>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
23
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
24 <!-- we need to be able to set the seed for the random number generator
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
25 <tests>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
26 <test>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
27 <param name="input" ftype="gff" value="pass_input.gff"/>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
28 <param name="min_window" value="2"/>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
29 <param name="max_window" value="6"/>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
30 <param name="false_num" value="5"/>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
31 <output name="output" file="pass_output.tab"/>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
32 </test>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
33 </tests>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
34 -->
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
35
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
36 <help>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
37 **Dataset formats**
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
38
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
39 The input is in GFF_ format, and the output is tabular_.
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
40 (`Dataset missing?`_)
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
41
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
42 .. _GFF: ./static/formatHelp.html#gff
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
43 .. _tabular: ./static/formatHelp.html#tab
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
44 .. _Dataset missing?: ./static/formatHelp.html
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
45
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
46 -----
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
47
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
48 **What it does**
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
49
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
50 PASS (Poisson Approximation for Statistical Significance) detects
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
51 significant transcription factor binding sites in the genome from
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
52 ChIP data. This is probably the only peak-calling method that
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
53 accurately controls the false-positive rate and FDR in ChIP data,
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
54 which is important given the huge discrepancy in results obtained
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
55 from different peak-calling algorithms. At the same time, this
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
56 method achieves a similar or better power than previous methods.
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
57
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
58 <!-- we don't have wrapper support for the "prior" file yet
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
59 Another unique feature of this method is that it allows varying
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
60 thresholds to be used for peak calling at different genomic
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
61 locations. For example, if a position lies in an open chromatin
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
62 region, is depleted of nucleosome positioning, or a co-binding
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
63 protein has been detected within the neighborhood, then the position
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
64 is more likely to be bound by the target protein of interest, and
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
65 hence a lower threshold will be used to call significant peaks.
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
66 As a result, weak but real binding sites can be detected.
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
67 -->
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
68
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
69 -----
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
70
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
71 **Hints**
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
72
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
73 - ChIP-Seq data:
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
74
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
75 If the data is from ChIP-Seq, you need to convert the ChIP-Seq values
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
76 into z-scores before using this program. It is also recommended that
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
77 you group read counts within a neighborhood together, e.g. in tiled
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
78 windows of 30bp. In this way, the ChIP-Seq data will resemble
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
79 ChIP-chip data in format.
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
80
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
81 - Choosing window size options:
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
82
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
83 The window size is related to the probe tiling density. For example,
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
84 if the probes are tiled at every 100bp, then setting the smallest
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
85 window = 2 and largest window = 6 is appropriate, because the DNA
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
86 fragment size is around 300-500bp.
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
87
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
88 -----
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
89
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
90 **Example**
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
91
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
92 - input file::
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
93
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
94 chr7 Nimblegen ID 40307603 40307652 1.668944 . . .
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
95 chr7 Nimblegen ID 40307703 40307752 0.8041307 . . .
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
96 chr7 Nimblegen ID 40307808 40307865 -1.089931 . . .
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
97 chr7 Nimblegen ID 40307920 40307969 1.055044 . . .
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
98 chr7 Nimblegen ID 40308005 40308068 2.447853 . . .
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
99 chr7 Nimblegen ID 40308125 40308174 0.1638694 . . .
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
100 chr7 Nimblegen ID 40308223 40308275 -0.04796628 . . .
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
101 chr7 Nimblegen ID 40308318 40308367 0.9335709 . . .
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
102 chr7 Nimblegen ID 40308526 40308584 0.5143972 . . .
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
103 chr7 Nimblegen ID 40308611 40308660 -1.089931 . . .
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
104 etc.
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
105
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
106 In GFF, a value of dot '.' is used to mean "not applicable".
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
107
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
108 - output file::
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
109
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
110 ID Chr Start End WinSz PeakValue # of FPs FDR
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
111 1 chr7 40310931 40311266 4 1.663446 0.248817 0.248817
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
112
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
113 -----
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
114
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
115 **References**
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
116
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
117 Zhang Y. (2008)
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
118 Poisson approximation for significance in genome-wide ChIP-chip tiling arrays.
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
119 Bioinformatics. 24(24):2825-31. Epub 2008 Oct 25.
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
120
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
121 Chen KB, Zhang Y. (2010)
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
122 A varying threshold method for ChIP peak calling using multiple sources of information.
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
123 Submitted.
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
124
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
125 </help>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
126 </tool>