annotate MatrixEQTL/man/Matrix_eQTL_main.Rd @ 3:ae74f8fb3aef draft

Uploaded
author jasonxu
date Fri, 12 Mar 2021 08:20:57 +0000
parents cd4c8e4a4b5b
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
1 \name{Matrix_eQTL_main}
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
2 \alias{Matrix_eQTL_main}
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
3 \alias{Matrix_eQTL_engine}
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
4 \title{
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
5 Main function for fast eQTL analysis in MatrixEQTL package
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
6 }
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
7 \description{
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
8 \code{Matrix_eQTL_engine} function tests association of every row of the \code{snps} dataset with every row of the \code{gene} dataset using a linear regression model defined by the \code{useModel} parameter (see below).
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
9
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
10 The testing procedure accounts for extra covariates in \code{cvrt} parameter.
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
11
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
12 The \code{errorCovariance} parameter can be set to the error variance-covariance matrix to account for heteroskedastic and/or correlated errors.
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
13
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
14 Associations significant at \code{pvOutputThreshold} (\code{pvOutputThreshold.cis}) levels are saved to \code{output_file_name} (\code{output_file_name.cis}), with corresponding estimates of effect size (slope coefficient), test statistics, p-values, and q-values (false discovery rate).
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
15
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
16 Matrix eQTL can perform separate analysis for local (cis) and distant (trans) eQTLs.
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
17 For such analysis one has to set the cis-analysis specific parameters \code{pvOutputThreshold.cis > 0}, \code{cisDist}, \code{snpspos} and {genepos} in the call of \code{Matrix_eQTL_main} function.
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
18 A gene-SNP pair is considered local if the distance between them is less or equal to \code{cisDist}.
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
19 The genomic location of genes and SNPs is defined by data frames \code{snpspos} and {genepos}.
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
20 Depending on p-value thresholds \code{pvOutputThreshold} and \code{pvOutputThreshold.cis} Matrix eQTL runs in one of three different modes:
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
21 \itemize{
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
22 \item Set \code{pvOutputThreshold > 0} and \code{pvOutputThreshold.cis = 0} (or use \code{Matrix_eQTL_engine}) to perform eQTL analysis without using gene/SNP locations. Associations significant at the \code{pvOutputThreshold} level are be recorded in \code{output_file_name} and in the returned object.
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
23 \item Set \code{pvOutputThreshold = 0} and \code{pvOutputThreshold.cis > 0} to perform eQTL analysis for local gene-SNP pairs only. Local associations significant at \code{pvOutputThreshold.cis} level will be recorded in \code{output_file_name.cis} and in the returned object.
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
24 \item Set \code{pvOutputThreshold > 0} and \code{pvOutputThreshold.cis > 0} to perform eQTL analysis with separate p-value thresholds for local and distant eQTLs. Distant and local associations significant at corresponding thresholds are recorded in \code{output_file_name} and \code{output_file_name.cis} respectively and in the returned object. In this case the false discovery rate is calculated separately for these two sets of eQTLs.
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
25 }
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
26 \code{Matrix_eQTL_engine} is a wrapper for \code{Matrix_eQTL_main} for eQTL analysis without regard to gene/SNP location and provided for compatibility with the previous versions of the package.
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
27
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
28 The parameter \code{pvalue.hist} allows to record information sufficient to create a histogram or QQ-plot of all the p-values (see \code{\link[=plot.MatrixEQTL]{plot}}).
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
29 }
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
30 \usage{
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
31 Matrix_eQTL_main( snps,
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
32 gene,
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
33 cvrt = SlicedData$new(),
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
34 output_file_name = "",
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
35 pvOutputThreshold = 1e-5,
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
36 useModel = modelLINEAR,
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
37 errorCovariance = numeric(),
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
38 verbose = TRUE,
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
39 output_file_name.cis = "",
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
40 pvOutputThreshold.cis = 0,
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
41 snpspos = NULL,
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
42 genepos = NULL,
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
43 cisDist = 1e6,
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
44 pvalue.hist = FALSE,
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
45 min.pv.by.genesnp = FALSE,
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
46 noFDRsaveMemory = FALSE)
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
47
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
48 Matrix_eQTL_engine(snps,
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
49 gene,
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
50 cvrt = SlicedData$new(),
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
51 output_file_name,
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
52 pvOutputThreshold = 1e-5,
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
53 useModel = modelLINEAR,
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
54 errorCovariance = numeric(),
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
55 verbose = TRUE,
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
56 pvalue.hist = FALSE,
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
57 min.pv.by.genesnp = FALSE,
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
58 noFDRsaveMemory = FALSE)
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
59 }
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
60 \arguments{
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
61 \item{snps}{
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
62 \code{\linkS4class{SlicedData}} object with genotype information.
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
63 Can be real-valued for linear models and must take at most 3 distinct values for ANOVA unless the number of ANOVA categories is set to a higher number (see \code{useModel} parameter).
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
64 }
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
65 \item{gene}{
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
66 \code{\linkS4class{SlicedData}} object with gene expression information.
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
67 Must have columns matching those of \code{snps}.
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
68 }
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
69 \item{cvrt}{
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
70 \code{\linkS4class{SlicedData}} object with additional covariates.
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
71 Can be an empty \code{SlicedData} object in case of no covariates.
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
72 The constant is always included in the model and would cause an error if included in \code{cvrt}.
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
73 The order of columns must match those in \code{snps} and \code{gene}.
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
74 }
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
75 \item{output_file_name}{
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
76 \code{character}, \code{connection}, or \code{NULL}.
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
77 If not \code{NULL}, significant associations are saved to this file (all significant associations if \code{pvOutputThreshold=0} or only distant if \code{pvOutputThreshold>0}).
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
78 If the file with this name exists, it is overwritten.
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
79 }
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
80 \item{output_file_name.cis}{
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
81 \code{character}, \code{connection}, or \code{NULL}.
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
82 If not \code{NULL}, significant local associations are saved to this file.
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
83 If the file with this name exists, it is overwritten.
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
84 }
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
85 \item{pvOutputThreshold}{
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
86 \code{numeric}. Significance threshold for all/distant tests.
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
87 }
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
88 \item{pvOutputThreshold.cis}{
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
89 \code{numeric}. Same as \code{pvOutputThreshold}, but for local eQTLs.
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
90 }
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
91 \item{useModel}{
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
92 \code{integer}. Eigher \code{modelLINEAR}, \code{modelANOVA}, or \code{modelLINEAR_CROSS}.
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
93 \enumerate{
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
94 \item Set \code{useModel = \link{modelLINEAR}} to model the effect of the genotype as additive linear and test for its significance using t-statistic.
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
95 \item Set \code{useModel = \link{modelANOVA}} to treat genotype as a categorical variables and use ANOVA model and test for its significance using F-test. The default number of ANOVA categories is 3. Set otherwise like this: \code{options(MatrixEQTL.ANOVA.categories=4)}.
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
96 \item Set \code{useModel = \link{modelLINEAR_CROSS}} to add a new term to the model
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
97 equal to the product of genotype and the last covariate; the significance of this term is then tested using t-statistic.
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
98 }
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
99
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
100 }
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
101 \item{errorCovariance}{
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
102 \code{numeric}. The error covariance matrix. Use \code{numeric()} for homoskedastic independent errors.
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
103 }
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
104 \item{verbose}{
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
105 \code{logical}. Set to \code{TRUE} to display more detailed report about the progress.
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
106 }
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
107 \item{snpspos}{
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
108 \code{data.frame} object with information about SNP locations, must have 3 columns - SNP name, chromosome, and position, like this:
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
109 \tabular{ccc}{
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
110 snpid \tab chr \tab pos \cr
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
111 Snp_01 \tab 1 \tab 721289 \cr
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
112 Snp_02 \tab 1 \tab 752565 \cr
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
113 \ldots \tab \ldots \tab \ldots \cr
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
114 }
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
115 }
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
116 \item{genepos}{
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
117 \code{data.frame} with information about transcript locations, must have 4 columns - the name, chromosome, and positions of the left and right ends, like this:
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
118 \tabular{cccc}{
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
119 geneid \tab chr \tab left \tab right \cr
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
120 Gene_01 \tab 1 \tab 721289 \tab 731289 \cr
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
121 Gene_02 \tab 1 \tab 752565 \tab 762565 \cr
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
122 \ldots \tab \ldots \tab \ldots \tab \ldots \cr
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
123 }
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
124 }
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
125 \item{cisDist}{
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
126 \code{numeric}. SNP-gene pairs within this distance are considered local. The distance is measured from the nearest end of the gene. SNPs within a gene are always considered local.
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
127 }
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
128 \item{pvalue.hist}{
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
129 \code{logical}, \code{numerical}, or \code{"qqplot"}.
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
130 Defines whether and how the distribution of p-values is recorded in the returned object.
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
131 If \code{pvalue.hist = FALSE}, the information is not recorded and the analysis is performed faster.
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
132 Alternatively, set \code{pvalue.hist = "qqplot"} to record information sufficient to create a QQ-plot of the p-values (use \code{\link[=plot.MatrixEQTL]{plot}} on the returned object to create the plot).
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
133 To record information for a histogram set \code{pvalue.hist} to the desired number of bins of equal size. Finally, \code{pvalue.hist} can also be set to a custom set of bin edges.
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
134 }
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
135 \item{min.pv.by.genesnp}{
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
136 \code{logical}. Set \code{min.pv.by.genesnp = TRUE} to record the minimum p-value for each SNP and each gene in the returned object. The minimum p-values are recorded even if if they are above the corresponding thresholds of \code{pvOutputThreshold} and \code{pvOutputThreshold.cis}. The analysis runs faster when the parameter is set to \code{FALSE}.
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
137 }
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
138 \item{noFDRsaveMemory}{
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
139 \code{logical}. Set \code{noFDRsaveMemory = TRUE} to save significant gene-SNP pairs directly to the output files, reduce memory footprint and skip FDR calculation. The eQTLs are not recorded in the returned object if \code{noFDRsaveMemory = TRUE}.
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
140 }
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
141 }
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
142 \details{
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
143 Note that the columns of \code{gene}, \code{snps}, and \code{cvrt} must match.
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
144 If they do not match in the input files, use \code{ColumnSubsample} method to subset and/or reorder them.
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
145 }
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
146 \value{
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
147 The detected eQTLs are saved in \code{output_file_name} and/or \code{output_file_name.cis} if they are not \code{NULL}.
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
148 The method also returns a list with a summary of the performed analysis.
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
149 \item{param}{Keeps all input parameters and also records the number of degrees of freedom for the full model.}
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
150 \item{time.in.sec}{Time difference between the start and the end of the analysis (in seconds).}
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
151 \item{all}{Information about all detected eQTLs.}
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
152 \item{cis}{Information about detected local eQTLs.}
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
153 \item{trans}{Information about detected distant eQTLs.}
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
154 The elements \code{all}, \code{cis}, and \code{trans} may contain the following components
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
155 \describe{
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
156 \item{\code{ntests}}{Total number of tests performed. This is used for FDR calculation.}
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
157 \item{\code{eqtls}}{Data frame with recorded significant associations. Not available if \code{noFDRsaveMemory=FALSE}}
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
158 \item{\code{neqtls}}{Number of significant associations recorded.}
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
159 \item{\code{hist.bins}}{Histogram bins used for recording p-value distribution. See \code{pvalue.hist} parameter.}
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
160 \item{\code{hist.counts}}{Number of p-value that fell in each histogram bin. See \code{pvalue.hist} parameter.}
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
161 \item{\code{min.pv.snps}}{Vector with the best p-value for each SNP. See \code{min.pv.by.genesnp} parameter.}
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
162 \item{\code{min.pv.gene}}{Vector with the best p-value for each gene. See \code{min.pv.by.genesnp} parameter.}
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
163 }
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
164 }
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
165 \references{
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
166 The package website: \url{http://www.bios.unc.edu/research/genomic_software/Matrix_eQTL/}
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
167 }
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
168 \author{
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
169 Andrey Shabalin \email{ashabalin@vcu.edu}
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
170 }
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
171 \seealso{
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
172 The code below is the sample code for eQTL analysis NOT using gene/SNP locations.
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
173
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
174 See \code{\link{MatrixEQTL_cis_code}} for sample code for eQTL analysis that separates local and distant tests.
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
175 }
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
176 \examples{
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
177 # Matrix eQTL by Andrey A. Shabalin
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
178 # http://www.bios.unc.edu/research/genomic_software/Matrix_eQTL/
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
179 #
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
180 # Be sure to use an up to date version of R and Matrix eQTL.
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
181
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
182 # source("Matrix_eQTL_R/Matrix_eQTL_engine.r");
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
183 library(MatrixEQTL)
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
184
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
185 ## Location of the package with the data files.
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
186 base.dir = find.package('MatrixEQTL');
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
187
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
188 ## Settings
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
189
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
190 # Linear model to use, modelANOVA, modelLINEAR, or modelLINEAR_CROSS
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
191 useModel = modelLINEAR; # modelANOVA, modelLINEAR, or modelLINEAR_CROSS
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
192
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
193 # Genotype file name
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
194 SNP_file_name = paste(base.dir, "/data/SNP.txt", sep="");
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
195
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
196 # Gene expression file name
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
197 expression_file_name = paste(base.dir, "/data/GE.txt", sep="");
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
198
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
199 # Covariates file name
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
200 # Set to character() for no covariates
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
201 covariates_file_name = paste(base.dir, "/data/Covariates.txt", sep="");
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
202
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
203 # Output file name
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
204 output_file_name = tempfile();
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
205
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
206 # Only associations significant at this level will be saved
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
207 pvOutputThreshold = 1e-2;
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
208
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
209 # Error covariance matrix
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
210 # Set to numeric() for identity.
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
211 errorCovariance = numeric();
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
212 # errorCovariance = read.table("Sample_Data/errorCovariance.txt");
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
213
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
214
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
215 ## Load genotype data
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
216
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
217 snps = SlicedData$new();
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
218 snps$fileDelimiter = "\t"; # the TAB character
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
219 snps$fileOmitCharacters = "NA"; # denote missing values;
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
220 snps$fileSkipRows = 1; # one row of column labels
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
221 snps$fileSkipColumns = 1; # one column of row labels
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
222 snps$fileSliceSize = 2000; # read file in slices of 2,000 rows
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
223 snps$LoadFile(SNP_file_name);
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
224
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
225 ## Load gene expression data
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
226
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
227 gene = SlicedData$new();
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
228 gene$fileDelimiter = "\t"; # the TAB character
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
229 gene$fileOmitCharacters = "NA"; # denote missing values;
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
230 gene$fileSkipRows = 1; # one row of column labels
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
231 gene$fileSkipColumns = 1; # one column of row labels
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
232 gene$fileSliceSize = 2000; # read file in slices of 2,000 rows
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
233 gene$LoadFile(expression_file_name);
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
234
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
235 ## Load covariates
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
236
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
237 cvrt = SlicedData$new();
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
238 cvrt$fileDelimiter = "\t"; # the TAB character
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
239 cvrt$fileOmitCharacters = "NA"; # denote missing values;
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
240 cvrt$fileSkipRows = 1; # one row of column labels
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
241 cvrt$fileSkipColumns = 1; # one column of row labels
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
242 if(length(covariates_file_name)>0) {
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
243 cvrt$LoadFile(covariates_file_name);
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
244 }
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
245
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
246 ## Run the analysis
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
247
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
248 me = Matrix_eQTL_engine(
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
249 snps = snps,
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
250 gene = gene,
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
251 cvrt = cvrt,
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
252 output_file_name = output_file_name,
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
253 pvOutputThreshold = pvOutputThreshold,
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
254 useModel = useModel,
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
255 errorCovariance = errorCovariance,
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
256 verbose = TRUE,
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
257 pvalue.hist = TRUE,
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
258 min.pv.by.genesnp = FALSE,
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
259 noFDRsaveMemory = FALSE);
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
260
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
261 unlink(output_file_name);
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
262
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
263 ## Results:
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
264
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
265 cat('Analysis done in: ', me$time.in.sec, ' seconds', '\n');
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
266 cat('Detected eQTLs:', '\n');
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
267 show(me$all$eqtls)
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
268
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
269 ## Plot the histogram of all p-values
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
270
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
271 plot(me)
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
272
cd4c8e4a4b5b Uploaded
jasonxu
parents:
diff changeset
273 }