sharplabtool: tools/rgenetics/rgQQ.py annotate

annotate tools/rgenetics/rgQQ.py @ 1:cdcb0ce84a1b

Uploaded

author	xuebing
date	Fri, 09 Mar 2012 19:45:15 -0500
parents	9071e359b9a3
children

rev	line source
0 9071e359b9a3 Uploaded xuebing parents: diff changeset	1 """
9071e359b9a3 Uploaded xuebing parents: diff changeset	2 oct 2009 - multiple output files
9071e359b9a3 Uploaded xuebing parents: diff changeset	3 Dear Matthias,
9071e359b9a3 Uploaded xuebing parents: diff changeset	4
9071e359b9a3 Uploaded xuebing parents: diff changeset	5 Yes, you can define number of outputs dynamically in Galaxy. For doing
9071e359b9a3 Uploaded xuebing parents: diff changeset	6 this, you'll have to declare one output dataset in your xml and pass
9071e359b9a3 Uploaded xuebing parents: diff changeset	7 its ID ($out_file.id) to your python script. Also, set
9071e359b9a3 Uploaded xuebing parents: diff changeset	8 force_history_refresh="True" in your tool tag in xml, like this:
9071e359b9a3 Uploaded xuebing parents: diff changeset	9 <tool id="split1" name="Split" force_history_refresh="True">
9071e359b9a3 Uploaded xuebing parents: diff changeset	10 In your script, if your outputs are named in the following format,
9071e359b9a3 Uploaded xuebing parents: diff changeset	11 primary_associatedWithDatasetID_designation_visibility_extension
9071e359b9a3 Uploaded xuebing parents: diff changeset	12 (_DBKEY), all your datasets will show up in the history pane.
9071e359b9a3 Uploaded xuebing parents: diff changeset	13 associatedWithDatasetID is the $out_file.ID passed from xml,
9071e359b9a3 Uploaded xuebing parents: diff changeset	14 designation will be a unique identifier for each output (set in your
9071e359b9a3 Uploaded xuebing parents: diff changeset	15 script),
9071e359b9a3 Uploaded xuebing parents: diff changeset	16 visibility can be set to visible if you want the dataset visible in
9071e359b9a3 Uploaded xuebing parents: diff changeset	17 your history, or notvisible otherwise
9071e359b9a3 Uploaded xuebing parents: diff changeset	18 extension is the required format for your dataset (bed, tabular, fasta
9071e359b9a3 Uploaded xuebing parents: diff changeset	19 etc)
9071e359b9a3 Uploaded xuebing parents: diff changeset	20 DBKEY is optional, and can be set if required (e.g. hg18, mm9 etc)
9071e359b9a3 Uploaded xuebing parents: diff changeset	21
9071e359b9a3 Uploaded xuebing parents: diff changeset	22 One of our tools "MAF to Interval converter" (tools/maf/
9071e359b9a3 Uploaded xuebing parents: diff changeset	23 maf_to_interval.xml) already uses this feature. You can use it as a
9071e359b9a3 Uploaded xuebing parents: diff changeset	24 reference.
9071e359b9a3 Uploaded xuebing parents: diff changeset	25
9071e359b9a3 Uploaded xuebing parents: diff changeset	26 qq.chisq Quantile-quantile plot for chi-squared tests
9071e359b9a3 Uploaded xuebing parents: diff changeset	27 Description
9071e359b9a3 Uploaded xuebing parents: diff changeset	28 This function plots ranked observed chi-squared test statistics against the corresponding expected
9071e359b9a3 Uploaded xuebing parents: diff changeset	29 order statistics. It also estimates an inflation (or deflation) factor, lambda, by the ratio of the trimmed
9071e359b9a3 Uploaded xuebing parents: diff changeset	30 means of observed and expected values. This is useful for inspecting the results of whole-genome
9071e359b9a3 Uploaded xuebing parents: diff changeset	31 association studies for overdispersion due to population substructure and other sources of bias or
9071e359b9a3 Uploaded xuebing parents: diff changeset	32 confounding.
9071e359b9a3 Uploaded xuebing parents: diff changeset	33 Usage
9071e359b9a3 Uploaded xuebing parents: diff changeset	34 qq.chisq(x, df=1, x.max, main="QQ plot",
9071e359b9a3 Uploaded xuebing parents: diff changeset	35 sub=paste("Expected distribution: chi-squared (",df," df)", sep=""),
9071e359b9a3 Uploaded xuebing parents: diff changeset	36 xlab="Expected", ylab="Observed",
9071e359b9a3 Uploaded xuebing parents: diff changeset	37 conc=c(0.025, 0.975), overdisp=FALSE, trim=0.5,
9071e359b9a3 Uploaded xuebing parents: diff changeset	38 slope.one=FALSE, slope.lambda=FALSE,
9071e359b9a3 Uploaded xuebing parents: diff changeset	39 thin=c(0.25,50), oor.pch=24, col.shade="gray", ...)
9071e359b9a3 Uploaded xuebing parents: diff changeset	40 Arguments
9071e359b9a3 Uploaded xuebing parents: diff changeset	41 x A vector of observed chi-squared test values
9071e359b9a3 Uploaded xuebing parents: diff changeset	42 df The degreees of freedom for the tests
9071e359b9a3 Uploaded xuebing parents: diff changeset	43 x.max If present, truncate the observed value (Y) axis here
9071e359b9a3 Uploaded xuebing parents: diff changeset	44 main The main heading
9071e359b9a3 Uploaded xuebing parents: diff changeset	45 sub The subheading
9071e359b9a3 Uploaded xuebing parents: diff changeset	46 xlab x-axis label (default "Expected")
9071e359b9a3 Uploaded xuebing parents: diff changeset	47 ylab y-axis label (default "Observed")
9071e359b9a3 Uploaded xuebing parents: diff changeset	48 conc Lower and upper probability bounds for concentration band for the plot. Set this
9071e359b9a3 Uploaded xuebing parents: diff changeset	49 to NA to suppress this
9071e359b9a3 Uploaded xuebing parents: diff changeset	50 overdisp If TRUE, an overdispersion factor, lambda, will be estimated and used in calculating
9071e359b9a3 Uploaded xuebing parents: diff changeset	51 concentration band
9071e359b9a3 Uploaded xuebing parents: diff changeset	52 trim Quantile point for trimmed mean calculations for estimation of lambda. Default
9071e359b9a3 Uploaded xuebing parents: diff changeset	53 is to trim at the median
9071e359b9a3 Uploaded xuebing parents: diff changeset	54 slope.one Is a line of slope one to be superimpsed?
9071e359b9a3 Uploaded xuebing parents: diff changeset	55 slope.lambda Is a line of slope lambda to be superimposed?
9071e359b9a3 Uploaded xuebing parents: diff changeset	56 thin A pair of numbers indicating how points will be thinned before plotting (see
9071e359b9a3 Uploaded xuebing parents: diff changeset	57 Details). If NA, no thinning will be carried out
9071e359b9a3 Uploaded xuebing parents: diff changeset	58 oor.pch Observed values greater than x.max are plotted at x.max. This argument sets
9071e359b9a3 Uploaded xuebing parents: diff changeset	59 the plotting symbol to be used for out-of-range observations
9071e359b9a3 Uploaded xuebing parents: diff changeset	60 col.shade The colour with which the concentration band will be filled
9071e359b9a3 Uploaded xuebing parents: diff changeset	61 ... Further graphical parameter settings to be passed to points()
9071e359b9a3 Uploaded xuebing parents: diff changeset	62
9071e359b9a3 Uploaded xuebing parents: diff changeset	63 Details
9071e359b9a3 Uploaded xuebing parents: diff changeset	64 To reduce plotting time and the size of plot files, the smallest observed and expected points are
9071e359b9a3 Uploaded xuebing parents: diff changeset	65 thinned so that only a reduced number of (approximately equally spaced) points are plotted. The
9071e359b9a3 Uploaded xuebing parents: diff changeset	66 precise behaviour is controlled by the parameter thin, whose value should be a pair of numbers.
9071e359b9a3 Uploaded xuebing parents: diff changeset	67 The first number must lie between 0 and 1 and sets the proportion of the X axis over which thinning
9071e359b9a3 Uploaded xuebing parents: diff changeset	68 is to be applied. The second number should be an integer and sets the maximum number of points
9071e359b9a3 Uploaded xuebing parents: diff changeset	69 to be plotted in this section.
9071e359b9a3 Uploaded xuebing parents: diff changeset	70 The "concentration band" for the plot is shown in grey. This region is defined by upper and lower
9071e359b9a3 Uploaded xuebing parents: diff changeset	71 probability bounds for each order statistic. The default is to use the 2.5 Note that this is not a
9071e359b9a3 Uploaded xuebing parents: diff changeset	72 simultaneous confidence region; the probability that the plot will stray outside the band at some
9071e359b9a3 Uploaded xuebing parents: diff changeset	73 point exceeds 95
9071e359b9a3 Uploaded xuebing parents: diff changeset	74 When required, he dispersion factor is estimated by the ratio of the observed trimmed mean to its
9071e359b9a3 Uploaded xuebing parents: diff changeset	75 expected value under the chi-squared assumption.
9071e359b9a3 Uploaded xuebing parents: diff changeset	76 Value
9071e359b9a3 Uploaded xuebing parents: diff changeset	77 The function returns the number of tests, the number of values omitted from the plot (greater than
9071e359b9a3 Uploaded xuebing parents: diff changeset	78 x.max), and the estimated dispersion factor, lambda.
9071e359b9a3 Uploaded xuebing parents: diff changeset	79 Note
9071e359b9a3 Uploaded xuebing parents: diff changeset	80 All tests must have the same number of degrees of freedom. If this is not the case, I suggest
9071e359b9a3 Uploaded xuebing parents: diff changeset	81 transforming to p-values and then plotting -2log(p) as chi-squared on 2 df.
9071e359b9a3 Uploaded xuebing parents: diff changeset	82 Author(s)
9071e359b9a3 Uploaded xuebing parents: diff changeset	83 David Clayton hdavid.clayton@cimr.cam.ac.uki
9071e359b9a3 Uploaded xuebing parents: diff changeset	84 References
9071e359b9a3 Uploaded xuebing parents: diff changeset	85 Devlin, B. and Roeder, K. (1999) Genomic control for association studies. Biometrics, 55:997-1004
9071e359b9a3 Uploaded xuebing parents: diff changeset	86 """
9071e359b9a3 Uploaded xuebing parents: diff changeset	87
9071e359b9a3 Uploaded xuebing parents: diff changeset	88 import sys, random, math, copy,os, subprocess, tempfile
9071e359b9a3 Uploaded xuebing parents: diff changeset	89 from rgutils import RRun, rexe
9071e359b9a3 Uploaded xuebing parents: diff changeset	90
9071e359b9a3 Uploaded xuebing parents: diff changeset	91 rqq = """
9071e359b9a3 Uploaded xuebing parents: diff changeset	92 # modified by ross lazarus for the rgenetics project may 2000
9071e359b9a3 Uploaded xuebing parents: diff changeset	93 # makes a pdf for galaxy from an x vector of chisquare values
9071e359b9a3 Uploaded xuebing parents: diff changeset	94 # from snpMatrix
9071e359b9a3 Uploaded xuebing parents: diff changeset	95 # http://www.bioconductor.org/packages/bioc/html/snpMatrix.html
9071e359b9a3 Uploaded xuebing parents: diff changeset	96 qq.chisq <-
9071e359b9a3 Uploaded xuebing parents: diff changeset	97 function(x, df=1, x.max,
9071e359b9a3 Uploaded xuebing parents: diff changeset	98 main="QQ plot",
9071e359b9a3 Uploaded xuebing parents: diff changeset	99 sub=paste("Expected distribution: chi-squared (",df," df)", sep=""),
9071e359b9a3 Uploaded xuebing parents: diff changeset	100 xlab="Expected", ylab="Observed",
9071e359b9a3 Uploaded xuebing parents: diff changeset	101 conc=c(0.025, 0.975), overdisp=FALSE, trim=0.5,
9071e359b9a3 Uploaded xuebing parents: diff changeset	102 slope.one=T, slope.lambda=FALSE,
9071e359b9a3 Uploaded xuebing parents: diff changeset	103 thin=c(0.5,200), oor.pch=24, col.shade="gray", ofname="qqchi.pdf",
9071e359b9a3 Uploaded xuebing parents: diff changeset	104 h=6,w=6,printpdf=F,...) {
9071e359b9a3 Uploaded xuebing parents: diff changeset	105
9071e359b9a3 Uploaded xuebing parents: diff changeset	106 # Function to shade concentration band
9071e359b9a3 Uploaded xuebing parents: diff changeset	107
9071e359b9a3 Uploaded xuebing parents: diff changeset	108 shade <- function(x1, y1, x2, y2, color=col.shade) {
9071e359b9a3 Uploaded xuebing parents: diff changeset	109 n <- length(x2)
9071e359b9a3 Uploaded xuebing parents: diff changeset	110 polygon(c(x1, x2[n:1]), c(y1, y2[n:1]), border=NA, col=color)
9071e359b9a3 Uploaded xuebing parents: diff changeset	111 }
9071e359b9a3 Uploaded xuebing parents: diff changeset	112
9071e359b9a3 Uploaded xuebing parents: diff changeset	113 # Sort values and see how many out of range
9071e359b9a3 Uploaded xuebing parents: diff changeset	114
9071e359b9a3 Uploaded xuebing parents: diff changeset	115 obsvd <- sort(x, na.last=NA)
9071e359b9a3 Uploaded xuebing parents: diff changeset	116 N <- length(obsvd)
9071e359b9a3 Uploaded xuebing parents: diff changeset	117 if (missing(x.max)) {
9071e359b9a3 Uploaded xuebing parents: diff changeset	118 Np <- N
9071e359b9a3 Uploaded xuebing parents: diff changeset	119 }
9071e359b9a3 Uploaded xuebing parents: diff changeset	120 else {
9071e359b9a3 Uploaded xuebing parents: diff changeset	121 Np <- sum(obsvd<=x.max)
9071e359b9a3 Uploaded xuebing parents: diff changeset	122 }
9071e359b9a3 Uploaded xuebing parents: diff changeset	123 if(Np==0)
9071e359b9a3 Uploaded xuebing parents: diff changeset	124 stop("Nothing to plot")
9071e359b9a3 Uploaded xuebing parents: diff changeset	125
9071e359b9a3 Uploaded xuebing parents: diff changeset	126 # Expected values
9071e359b9a3 Uploaded xuebing parents: diff changeset	127
9071e359b9a3 Uploaded xuebing parents: diff changeset	128 if (df==2) {
9071e359b9a3 Uploaded xuebing parents: diff changeset	129 expctd <- 2*cumsum(1/(N:1))
9071e359b9a3 Uploaded xuebing parents: diff changeset	130 }
9071e359b9a3 Uploaded xuebing parents: diff changeset	131 else {
9071e359b9a3 Uploaded xuebing parents: diff changeset	132 expctd <- qchisq(p=(1:N)/(N+1), df=df)
9071e359b9a3 Uploaded xuebing parents: diff changeset	133 }
9071e359b9a3 Uploaded xuebing parents: diff changeset	134
9071e359b9a3 Uploaded xuebing parents: diff changeset	135 # Concentration bands
9071e359b9a3 Uploaded xuebing parents: diff changeset	136
9071e359b9a3 Uploaded xuebing parents: diff changeset	137 if (!is.null(conc)) {
9071e359b9a3 Uploaded xuebing parents: diff changeset	138 if(conc[1]>0) {
9071e359b9a3 Uploaded xuebing parents: diff changeset	139 e.low <- qchisq(p=qbeta(conc[1], 1:N, N:1), df=df)
9071e359b9a3 Uploaded xuebing parents: diff changeset	140 }
9071e359b9a3 Uploaded xuebing parents: diff changeset	141 else {
9071e359b9a3 Uploaded xuebing parents: diff changeset	142 e.low <- rep(0, N)
9071e359b9a3 Uploaded xuebing parents: diff changeset	143 }
9071e359b9a3 Uploaded xuebing parents: diff changeset	144 if (conc[2]<1) {
9071e359b9a3 Uploaded xuebing parents: diff changeset	145 e.high <- qchisq(p=qbeta(conc[2], 1:N, N:1), df=df)
9071e359b9a3 Uploaded xuebing parents: diff changeset	146 }
9071e359b9a3 Uploaded xuebing parents: diff changeset	147 else {
9071e359b9a3 Uploaded xuebing parents: diff changeset	148 e.high <- 1.1*rep(max(x),N)
9071e359b9a3 Uploaded xuebing parents: diff changeset	149 }
9071e359b9a3 Uploaded xuebing parents: diff changeset	150 }
9071e359b9a3 Uploaded xuebing parents: diff changeset	151
9071e359b9a3 Uploaded xuebing parents: diff changeset	152 # Plot outline
9071e359b9a3 Uploaded xuebing parents: diff changeset	153
9071e359b9a3 Uploaded xuebing parents: diff changeset	154 if (Np < N)
9071e359b9a3 Uploaded xuebing parents: diff changeset	155 top <- x.max
9071e359b9a3 Uploaded xuebing parents: diff changeset	156 else
9071e359b9a3 Uploaded xuebing parents: diff changeset	157 top <- obsvd[N]
9071e359b9a3 Uploaded xuebing parents: diff changeset	158 right <- expctd[N]
9071e359b9a3 Uploaded xuebing parents: diff changeset	159 if (printpdf) {pdf(ofname,h,w)}
9071e359b9a3 Uploaded xuebing parents: diff changeset	160 plot(c(0, right), c(0, top), type="n", xlab=xlab, ylab=ylab,
9071e359b9a3 Uploaded xuebing parents: diff changeset	161 main=main, sub=sub)
9071e359b9a3 Uploaded xuebing parents: diff changeset	162
9071e359b9a3 Uploaded xuebing parents: diff changeset	163 # Thinning
9071e359b9a3 Uploaded xuebing parents: diff changeset	164
9071e359b9a3 Uploaded xuebing parents: diff changeset	165 if (is.na(thin[1])) {
9071e359b9a3 Uploaded xuebing parents: diff changeset	166 show <- 1:Np
9071e359b9a3 Uploaded xuebing parents: diff changeset	167 }
9071e359b9a3 Uploaded xuebing parents: diff changeset	168 else if (length(thin)!=2 \|\| thin[1]<0 \|\| thin[1]>1 \|\| thin[2]<1) {
9071e359b9a3 Uploaded xuebing parents: diff changeset	169 warning("invalid thin parameter; no thinning carried out")
9071e359b9a3 Uploaded xuebing parents: diff changeset	170 show <- 1:Np
9071e359b9a3 Uploaded xuebing parents: diff changeset	171 }
9071e359b9a3 Uploaded xuebing parents: diff changeset	172 else {
9071e359b9a3 Uploaded xuebing parents: diff changeset	173 space <- right*thin[1]/floor(thin[2])
9071e359b9a3 Uploaded xuebing parents: diff changeset	174 iat <- round((N+1)pchisq(q=(1:floor(thin[2]))space, df=df))
9071e359b9a3 Uploaded xuebing parents: diff changeset	175 if (max(iat)>thin[2])
9071e359b9a3 Uploaded xuebing parents: diff changeset	176 show <- unique(c(iat, (1+max(iat)):Np))
9071e359b9a3 Uploaded xuebing parents: diff changeset	177 else
9071e359b9a3 Uploaded xuebing parents: diff changeset	178 show <- 1:Np
9071e359b9a3 Uploaded xuebing parents: diff changeset	179 }
9071e359b9a3 Uploaded xuebing parents: diff changeset	180 Nu <- floor(trim*N)
9071e359b9a3 Uploaded xuebing parents: diff changeset	181 if (Nu>0)
9071e359b9a3 Uploaded xuebing parents: diff changeset	182 lambda <- mean(obsvd[1:Nu])/mean(expctd[1:Nu])
9071e359b9a3 Uploaded xuebing parents: diff changeset	183 if (!is.null(conc)) {
9071e359b9a3 Uploaded xuebing parents: diff changeset	184 if (Np<N)
9071e359b9a3 Uploaded xuebing parents: diff changeset	185 vert <- c(show, (Np+1):N)
9071e359b9a3 Uploaded xuebing parents: diff changeset	186 else
9071e359b9a3 Uploaded xuebing parents: diff changeset	187 vert <- show
9071e359b9a3 Uploaded xuebing parents: diff changeset	188 if (overdisp)
9071e359b9a3 Uploaded xuebing parents: diff changeset	189 shade(expctd[vert], lambda*e.low[vert],
9071e359b9a3 Uploaded xuebing parents: diff changeset	190 expctd[vert], lambda*e.high[vert])
9071e359b9a3 Uploaded xuebing parents: diff changeset	191 else
9071e359b9a3 Uploaded xuebing parents: diff changeset	192 shade(expctd[vert], e.low[vert], expctd[vert], e.high[vert])
9071e359b9a3 Uploaded xuebing parents: diff changeset	193 }
9071e359b9a3 Uploaded xuebing parents: diff changeset	194 points(expctd[show], obsvd[show], ...)
9071e359b9a3 Uploaded xuebing parents: diff changeset	195 # Overflow
9071e359b9a3 Uploaded xuebing parents: diff changeset	196 if (Np<N) {
9071e359b9a3 Uploaded xuebing parents: diff changeset	197 over <- (Np+1):N
9071e359b9a3 Uploaded xuebing parents: diff changeset	198 points(expctd[over], rep(x.max, N-Np), pch=oor.pch)
9071e359b9a3 Uploaded xuebing parents: diff changeset	199 }
9071e359b9a3 Uploaded xuebing parents: diff changeset	200 # Lines
9071e359b9a3 Uploaded xuebing parents: diff changeset	201 line.types <- c("solid", "dashed", "dotted")
9071e359b9a3 Uploaded xuebing parents: diff changeset	202 key <- NULL
9071e359b9a3 Uploaded xuebing parents: diff changeset	203 txt <- NULL
9071e359b9a3 Uploaded xuebing parents: diff changeset	204 if (slope.one) {
9071e359b9a3 Uploaded xuebing parents: diff changeset	205 key <- c(key, line.types[1])
9071e359b9a3 Uploaded xuebing parents: diff changeset	206 txt <- c(txt, "y = x")
9071e359b9a3 Uploaded xuebing parents: diff changeset	207 abline(a=0, b=1, lty=line.types[1])
9071e359b9a3 Uploaded xuebing parents: diff changeset	208 }
9071e359b9a3 Uploaded xuebing parents: diff changeset	209 if (slope.lambda && Nu>0) {
9071e359b9a3 Uploaded xuebing parents: diff changeset	210 key <- c(key, line.types[2])
9071e359b9a3 Uploaded xuebing parents: diff changeset	211 txt <- c(txt, paste("y = ", format(lambda, digits=4), "x", sep=""))
9071e359b9a3 Uploaded xuebing parents: diff changeset	212 if (!is.null(conc)) {
9071e359b9a3 Uploaded xuebing parents: diff changeset	213 if (Np<N)
9071e359b9a3 Uploaded xuebing parents: diff changeset	214 vert <- c(show, (Np+1):N)
9071e359b9a3 Uploaded xuebing parents: diff changeset	215 else
9071e359b9a3 Uploaded xuebing parents: diff changeset	216 vert <- show
9071e359b9a3 Uploaded xuebing parents: diff changeset	217 }
9071e359b9a3 Uploaded xuebing parents: diff changeset	218 abline(a=0, b=lambda, lty=line.types[2])
9071e359b9a3 Uploaded xuebing parents: diff changeset	219 }
9071e359b9a3 Uploaded xuebing parents: diff changeset	220 if (printpdf) {dev.off()}
9071e359b9a3 Uploaded xuebing parents: diff changeset	221 # Returned value
9071e359b9a3 Uploaded xuebing parents: diff changeset	222
9071e359b9a3 Uploaded xuebing parents: diff changeset	223 if (!is.null(key))
9071e359b9a3 Uploaded xuebing parents: diff changeset	224 legend(0, top, legend=txt, lty=key)
9071e359b9a3 Uploaded xuebing parents: diff changeset	225 c(N=N, omitted=N-Np, lambda=lambda)
9071e359b9a3 Uploaded xuebing parents: diff changeset	226
9071e359b9a3 Uploaded xuebing parents: diff changeset	227 }
9071e359b9a3 Uploaded xuebing parents: diff changeset	228
9071e359b9a3 Uploaded xuebing parents: diff changeset	229 """
9071e359b9a3 Uploaded xuebing parents: diff changeset	230
9071e359b9a3 Uploaded xuebing parents: diff changeset	231
9071e359b9a3 Uploaded xuebing parents: diff changeset	232
9071e359b9a3 Uploaded xuebing parents: diff changeset	233
9071e359b9a3 Uploaded xuebing parents: diff changeset	234 def makeQQ(dat=[], sample=1.0, maxveclen=4000, fname='fname',title='title',
9071e359b9a3 Uploaded xuebing parents: diff changeset	235 xvar='Sample',h=8,w=8,logscale=True,outdir=None):
9071e359b9a3 Uploaded xuebing parents: diff changeset	236 """
9071e359b9a3 Uploaded xuebing parents: diff changeset	237 y is data for a qq plot and ends up on the x axis go figure
9071e359b9a3 Uploaded xuebing parents: diff changeset	238 if sampling, oversample low values - all the top 1% ?
9071e359b9a3 Uploaded xuebing parents: diff changeset	239 assume we have 0-1 p values
9071e359b9a3 Uploaded xuebing parents: diff changeset	240 """
9071e359b9a3 Uploaded xuebing parents: diff changeset	241 R = []
9071e359b9a3 Uploaded xuebing parents: diff changeset	242 colour="maroon"
9071e359b9a3 Uploaded xuebing parents: diff changeset	243 nrows = len(dat)
9071e359b9a3 Uploaded xuebing parents: diff changeset	244 dat.sort() # small to large
9071e359b9a3 Uploaded xuebing parents: diff changeset	245 fn = float(nrows)
9071e359b9a3 Uploaded xuebing parents: diff changeset	246 unifx = [x/fn for x in range(1,(nrows+1))]
9071e359b9a3 Uploaded xuebing parents: diff changeset	247 if logscale:
9071e359b9a3 Uploaded xuebing parents: diff changeset	248 unifx = [-math.log10(x) for x in unifx] # uniform distribution
9071e359b9a3 Uploaded xuebing parents: diff changeset	249 if sample < 1.0 and len(dat) > maxveclen:
9071e359b9a3 Uploaded xuebing parents: diff changeset	250 # now have half a million markers eg - too many to plot all for a pdf - sample to get 10k or so points
9071e359b9a3 Uploaded xuebing parents: diff changeset	251 # oversample part of the distribution
9071e359b9a3 Uploaded xuebing parents: diff changeset	252 always = min(1000,nrows/20) # oversample smaller of lowest few hundred items or 5%
9071e359b9a3 Uploaded xuebing parents: diff changeset	253 skip = int(nrows/float(maxveclen)) # take 1 in skip to get about maxveclen points
9071e359b9a3 Uploaded xuebing parents: diff changeset	254 if skip <= 1:
9071e359b9a3 Uploaded xuebing parents: diff changeset	255 skip = 2
9071e359b9a3 Uploaded xuebing parents: diff changeset	256 samplei = [i for i in range(nrows) if (i < always) or (i % skip == 0)]
9071e359b9a3 Uploaded xuebing parents: diff changeset	257 # always oversample first sorted (here lowest) values
9071e359b9a3 Uploaded xuebing parents: diff changeset	258 yvec = [dat[i] for i in samplei] # always get first and last
9071e359b9a3 Uploaded xuebing parents: diff changeset	259 xvec = [unifx[i] for i in samplei] # and sample xvec same way
9071e359b9a3 Uploaded xuebing parents: diff changeset	260 maint='QQ %s (random %d of %d)' % (title,len(yvec),nrows)
9071e359b9a3 Uploaded xuebing parents: diff changeset	261 else:
9071e359b9a3 Uploaded xuebing parents: diff changeset	262 yvec = [x for x in dat]
9071e359b9a3 Uploaded xuebing parents: diff changeset	263 maint='QQ %s (n=%d)' % (title,nrows)
9071e359b9a3 Uploaded xuebing parents: diff changeset	264 xvec = unifx
9071e359b9a3 Uploaded xuebing parents: diff changeset	265 if logscale:
9071e359b9a3 Uploaded xuebing parents: diff changeset	266 maint = 'Log%s' % maint
9071e359b9a3 Uploaded xuebing parents: diff changeset	267 mx = [0,math.log10(nrows)] # if 1000, becomes 3 for the null line
9071e359b9a3 Uploaded xuebing parents: diff changeset	268 ylab = '-log10(%s) Quantiles' % title
9071e359b9a3 Uploaded xuebing parents: diff changeset	269 xlab = '-log10(Uniform 0-1) Quantiles'
9071e359b9a3 Uploaded xuebing parents: diff changeset	270 yvec = [-math.log10(x) for x in yvec if x > 0.0]
9071e359b9a3 Uploaded xuebing parents: diff changeset	271 else:
9071e359b9a3 Uploaded xuebing parents: diff changeset	272 mx = [0,1]
9071e359b9a3 Uploaded xuebing parents: diff changeset	273 ylab = '%s Quantiles' % title
9071e359b9a3 Uploaded xuebing parents: diff changeset	274 xlab = 'Uniform 0-1 Quantiles'
9071e359b9a3 Uploaded xuebing parents: diff changeset	275
9071e359b9a3 Uploaded xuebing parents: diff changeset	276 xv = ['%f' % x for x in xvec]
9071e359b9a3 Uploaded xuebing parents: diff changeset	277 R.append('xvec = c(%s)' % ','.join(xv))
9071e359b9a3 Uploaded xuebing parents: diff changeset	278 yv = ['%f' % x for x in yvec]
9071e359b9a3 Uploaded xuebing parents: diff changeset	279 R.append('yvec = c(%s)' % ','.join(yv))
9071e359b9a3 Uploaded xuebing parents: diff changeset	280 R.append('mx = c(0,%f)' % (math.log10(fn)))
9071e359b9a3 Uploaded xuebing parents: diff changeset	281 R.append('pdf("%s",h=%d,w=%d)' % (fname,h,w))
9071e359b9a3 Uploaded xuebing parents: diff changeset	282 R.append("par(lab=c(10,10,10))")
9071e359b9a3 Uploaded xuebing parents: diff changeset	283 R.append('qqplot(xvec,yvec,xlab="%s",ylab="%s",main="%s",sub="%s",pch=19,col="%s",cex=0.8)' % (xlab,ylab,maint,title,colour))
9071e359b9a3 Uploaded xuebing parents: diff changeset	284 R.append('points(mx,mx,type="l")')
9071e359b9a3 Uploaded xuebing parents: diff changeset	285 R.append('grid(col="lightgray",lty="dotted")')
9071e359b9a3 Uploaded xuebing parents: diff changeset	286 R.append('dev.off()')
9071e359b9a3 Uploaded xuebing parents: diff changeset	287 RRun(rcmd=R,title='makeQQplot',outdir=outdir)
9071e359b9a3 Uploaded xuebing parents: diff changeset	288
9071e359b9a3 Uploaded xuebing parents: diff changeset	289
9071e359b9a3 Uploaded xuebing parents: diff changeset	290
9071e359b9a3 Uploaded xuebing parents: diff changeset	291 def main():
9071e359b9a3 Uploaded xuebing parents: diff changeset	292 u = """
9071e359b9a3 Uploaded xuebing parents: diff changeset	293 """
9071e359b9a3 Uploaded xuebing parents: diff changeset	294 u = """<command interpreter="python">
9071e359b9a3 Uploaded xuebing parents: diff changeset	295 rgQQ.py "$input1" "$name" $sample "$cols" $allqq $height $width $logtrans $allqq.id $__new_file_path__
9071e359b9a3 Uploaded xuebing parents: diff changeset	296 </command>
9071e359b9a3 Uploaded xuebing parents: diff changeset	297
9071e359b9a3 Uploaded xuebing parents: diff changeset	298 </command>
9071e359b9a3 Uploaded xuebing parents: diff changeset	299 """
9071e359b9a3 Uploaded xuebing parents: diff changeset	300 print >> sys.stdout,'## rgQQ.py. cl=',sys.argv
9071e359b9a3 Uploaded xuebing parents: diff changeset	301 npar = 11
9071e359b9a3 Uploaded xuebing parents: diff changeset	302 if len(sys.argv) < npar:
9071e359b9a3 Uploaded xuebing parents: diff changeset	303 print >> sys.stdout, '## error - too few command line parameters - wanting %d' % npar
9071e359b9a3 Uploaded xuebing parents: diff changeset	304 print >> sys.stdout, u
9071e359b9a3 Uploaded xuebing parents: diff changeset	305 sys.exit(1)
9071e359b9a3 Uploaded xuebing parents: diff changeset	306 in_fname = sys.argv[1]
9071e359b9a3 Uploaded xuebing parents: diff changeset	307 name = sys.argv[2]
9071e359b9a3 Uploaded xuebing parents: diff changeset	308 sample = float(sys.argv[3])
9071e359b9a3 Uploaded xuebing parents: diff changeset	309 head = None
9071e359b9a3 Uploaded xuebing parents: diff changeset	310 columns = [int(x) for x in sys.argv[4].strip().split(',')] # work with python columns!
9071e359b9a3 Uploaded xuebing parents: diff changeset	311 allout = sys.argv[5]
9071e359b9a3 Uploaded xuebing parents: diff changeset	312 height = int(sys.argv[6])
9071e359b9a3 Uploaded xuebing parents: diff changeset	313 width = int(sys.argv[7])
9071e359b9a3 Uploaded xuebing parents: diff changeset	314 logscale = (sys.argv[8].lower() == 'true')
9071e359b9a3 Uploaded xuebing parents: diff changeset	315 outid = sys.argv[9] # this is used to allow multiple output files
9071e359b9a3 Uploaded xuebing parents: diff changeset	316 outdir = sys.argv[10]
9071e359b9a3 Uploaded xuebing parents: diff changeset	317 nan_row = False
9071e359b9a3 Uploaded xuebing parents: diff changeset	318 rows = []
9071e359b9a3 Uploaded xuebing parents: diff changeset	319 for i, line in enumerate( file( sys.argv[1] ) ):
9071e359b9a3 Uploaded xuebing parents: diff changeset	320 # Skip comments
9071e359b9a3 Uploaded xuebing parents: diff changeset	321 if line.startswith( '#' ) or ( i == 0 ):
9071e359b9a3 Uploaded xuebing parents: diff changeset	322 if i == 0:
9071e359b9a3 Uploaded xuebing parents: diff changeset	323 head = line.strip().split("\t")
9071e359b9a3 Uploaded xuebing parents: diff changeset	324 continue
9071e359b9a3 Uploaded xuebing parents: diff changeset	325 if len(line.strip()) == 0:
9071e359b9a3 Uploaded xuebing parents: diff changeset	326 continue
9071e359b9a3 Uploaded xuebing parents: diff changeset	327 # Extract values and convert to floats
9071e359b9a3 Uploaded xuebing parents: diff changeset	328 fields = line.strip().split( "\t" )
9071e359b9a3 Uploaded xuebing parents: diff changeset	329 row = []
9071e359b9a3 Uploaded xuebing parents: diff changeset	330 nan_row = False
9071e359b9a3 Uploaded xuebing parents: diff changeset	331 for column in columns:
9071e359b9a3 Uploaded xuebing parents: diff changeset	332 if len( fields ) <= column:
9071e359b9a3 Uploaded xuebing parents: diff changeset	333 return fail( "No column %d on line %d: %s" % ( column, i, fields ) )
9071e359b9a3 Uploaded xuebing parents: diff changeset	334 val = fields[column]
9071e359b9a3 Uploaded xuebing parents: diff changeset	335 if val.lower() == "na":
9071e359b9a3 Uploaded xuebing parents: diff changeset	336 nan_row = True
9071e359b9a3 Uploaded xuebing parents: diff changeset	337 else:
9071e359b9a3 Uploaded xuebing parents: diff changeset	338 try:
9071e359b9a3 Uploaded xuebing parents: diff changeset	339 row.append( float( fields[column] ) )
9071e359b9a3 Uploaded xuebing parents: diff changeset	340 except ValueError:
9071e359b9a3 Uploaded xuebing parents: diff changeset	341 return fail( "Value '%s' in column %d on line %d is not numeric" % ( fields[column], column+1, i ) )
9071e359b9a3 Uploaded xuebing parents: diff changeset	342 if not nan_row:
9071e359b9a3 Uploaded xuebing parents: diff changeset	343 rows.append( row )
9071e359b9a3 Uploaded xuebing parents: diff changeset	344 if i > 1:
9071e359b9a3 Uploaded xuebing parents: diff changeset	345 i = i-1 # remove header row from count
9071e359b9a3 Uploaded xuebing parents: diff changeset	346 if head == None:
9071e359b9a3 Uploaded xuebing parents: diff changeset	347 head = ['Col%d' % (x+1) for x in columns]
9071e359b9a3 Uploaded xuebing parents: diff changeset	348 R = []
9071e359b9a3 Uploaded xuebing parents: diff changeset	349 for c,column in enumerate(columns): # we appended each column in turn
9071e359b9a3 Uploaded xuebing parents: diff changeset	350 outname = allout
9071e359b9a3 Uploaded xuebing parents: diff changeset	351 if c > 0: # after first time
9071e359b9a3 Uploaded xuebing parents: diff changeset	352 outname = 'primary_%s_col%s_visible_pdf' % (outid,column)
9071e359b9a3 Uploaded xuebing parents: diff changeset	353 outname = os.path.join(outdir,outname)
9071e359b9a3 Uploaded xuebing parents: diff changeset	354 dat = []
9071e359b9a3 Uploaded xuebing parents: diff changeset	355 nrows = len(rows) # sometimes lots of NA's!!
9071e359b9a3 Uploaded xuebing parents: diff changeset	356 for arow in rows:
9071e359b9a3 Uploaded xuebing parents: diff changeset	357 dat.append(arow[c]) # remember, we appended each col in turn!
9071e359b9a3 Uploaded xuebing parents: diff changeset	358 cname = head[column]
9071e359b9a3 Uploaded xuebing parents: diff changeset	359 makeQQ(dat=dat,sample=sample,fname=outname,title='%s_%s' % (name,cname),
9071e359b9a3 Uploaded xuebing parents: diff changeset	360 xvar=cname,h=height,w=width,logscale=logscale,outdir=outdir)
9071e359b9a3 Uploaded xuebing parents: diff changeset	361
9071e359b9a3 Uploaded xuebing parents: diff changeset	362
9071e359b9a3 Uploaded xuebing parents: diff changeset	363
9071e359b9a3 Uploaded xuebing parents: diff changeset	364 if __name__ == "__main__":
9071e359b9a3 Uploaded xuebing parents: diff changeset	365 main()

Mercurial > repos > xuebing > sharplabtool

annotate tools/rgenetics/rgQQ.py @ 1:cdcb0ce84a1b