0
|
1 <tool id="hgv_gpass" name="GPASS" version="1.0.0">
|
|
2 <description>significant single-SNP associations in case-control studies</description>
|
|
3
|
|
4 <command interpreter="perl">
|
|
5 gpass.pl ${input1.extra_files_path}/${input1.metadata.base_name}.map ${input1.extra_files_path}/${input1.metadata.base_name}.ped $output $fdr
|
|
6 </command>
|
|
7
|
|
8 <inputs>
|
|
9 <param name="input1" type="data" format="lped" label="Dataset"/>
|
|
10 <param name="fdr" type="float" value="0.05" label="FDR"/>
|
|
11 </inputs>
|
|
12
|
|
13 <outputs>
|
|
14 <data name="output" format="tabular" />
|
|
15 </outputs>
|
|
16
|
|
17 <requirements>
|
|
18 <requirement type="package">gpass</requirement>
|
|
19 </requirements>
|
|
20
|
|
21 <!-- we need to be able to set the seed for the random number generator
|
|
22 <tests>
|
|
23 <test>
|
|
24 <param name='input1' value='gpass_and_beam_input' ftype='lped' >
|
|
25 <metadata name='base_name' value='gpass_and_beam_input' />
|
|
26 <composite_data value='gpass_and_beam_input.ped' />
|
|
27 <composite_data value='gpass_and_beam_input.map' />
|
|
28 <edit_attributes type='name' value='gpass_and_beam_input' />
|
|
29 </param>
|
|
30 <param name="fdr" value="0.05" />
|
|
31 <output name="output" file="gpass_output.txt" />
|
|
32 </test>
|
|
33 </tests>
|
|
34 -->
|
|
35
|
|
36 <help>
|
|
37 **Dataset formats**
|
|
38
|
|
39 The input dataset must be in lped_ format, and the output is tabular_.
|
|
40 (`Dataset missing?`_)
|
|
41
|
|
42 .. _lped: ./static/formatHelp.html#lped
|
|
43 .. _tabular: ./static/formatHelp.html#tab
|
|
44 .. _Dataset missing?: ./static/formatHelp.html
|
|
45
|
|
46 -----
|
|
47
|
|
48 **What it does**
|
|
49
|
|
50 GPASS (Genome-wide Poisson Approximation for Statistical Significance)
|
|
51 detects significant single-SNP associations in case-control studies at a user-specified FDR. Unlike previous methods, this tool can accurately approximate the genome-wide significance and FDR of SNP associations, while adjusting for millions of multiple comparisons, within seconds or minutes.
|
|
52
|
|
53 The program has two main functionalities:
|
|
54
|
|
55 1. Detect significant single-SNP associations at a user-specified false
|
|
56 discovery rate (FDR).
|
|
57
|
|
58 *Note*: a "typical" definition of FDR could be
|
|
59 FDR = E(# of false positive SNPs / # of significant SNPs)
|
|
60
|
|
61 This definition however is very inappropriate for association mapping, since SNPs are
|
|
62 highly correlated. Our FDR is
|
|
63 defined differently to account for SNP correlations, and thus will obtain
|
|
64 a proper FDR in terms of "proportion of false positive loci".
|
|
65
|
|
66 2. Approximate the significance of a list of candidate SNPs, adjusting for
|
|
67 multiple comparisons. If you have isolated a few SNPs of interest and want
|
|
68 to know their significance in a GWAS, you can supply the GWAS data and let
|
|
69 the program specifically test those SNPs.
|
|
70
|
|
71
|
|
72 *Also note*: the number of SNPs in a study cannot be both too small and at the same
|
|
73 time too clustered in a local region. A few hundreds of SNPs, or tens of SNPs
|
|
74 spread in different regions, will be fine. The sample size cannot be too small
|
|
75 either; around 100 or more individuals (case + control combined) will be fine.
|
|
76 Otherwise use permutation.
|
|
77
|
|
78 -----
|
|
79
|
|
80 **Example**
|
|
81
|
|
82 - input map file::
|
|
83
|
|
84 1 rs0 0 738547
|
|
85 1 rs1 0 5597094
|
|
86 1 rs2 0 9424115
|
|
87 etc.
|
|
88
|
|
89 - input ped file::
|
|
90
|
|
91 1 1 0 0 1 1 G G A A A A A A A A A G A A G G G G A A G G G G G G A A A A A G A A G G A G A G A A G G A A G G A A G G A G A A G G A A G G A A A G A G G G A G G G G G A A A G A A G G G G G G G G A G A A A A A A A A
|
|
92 1 1 0 0 1 1 G G A G G G A A A A A G A A G G G G G G A A G G A G A G G G G G A G G G A G A A G G A G G G A A G G G G A G A G G G A G A A A A G G G G A G A G G G A G A A A A A G G G A G G G A G G G G G A A G G A G
|
|
93 etc.
|
|
94
|
|
95 - output dataset, showing significant SNPs and their p-values and FDR::
|
|
96
|
|
97 #ID chr position Statistics adj-Pvalue FDR
|
|
98 rs35 chr1 136606952 4.890849 0.991562 0.682138
|
|
99 rs36 chr1 137748344 4.931934 0.991562 0.795827
|
|
100 rs44 chr2 14423047 7.712832 0.665086 0.218776
|
|
101 etc.
|
|
102
|
|
103 -----
|
|
104
|
|
105 **Reference**
|
|
106
|
|
107 Zhang Y, Liu JS. (2010)
|
|
108 Fast and accurate significance approximation for genome-wide association studies.
|
|
109 Submitted.
|
|
110
|
|
111 </help>
|
|
112 </tool>
|