# HG changeset patch # User iuc # Date 1553595938 14400 # Node ID c24765926774c479820f11af26c1116e7a250935 # Parent 61dffb20b6f90cc707bd8a6951efbe41db19ecea planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/ruvseq commit 9ed3d83cc447ee897af867362bf1dd67af8a11c2 diff -r 61dffb20b6f9 -r c24765926774 get_deseq_dataset.R --- a/get_deseq_dataset.R Wed Sep 05 15:54:16 2018 -0400 +++ b/get_deseq_dataset.R Tue Mar 26 06:25:38 2019 -0400 @@ -9,12 +9,12 @@ } if (!is.null(tximport)) { - if (is.null(tx2gene)) stop("A transcript-to-gene map or a GTF file is required for tximport") - if (tolower(file_ext(opt$tx2gene)) == "gtf") { - gtfFile <-tx2gene + if (is.null(tx2gene)) stop("A transcript-to-gene map or a GTF/GFF3 file is required for tximport") + if (tolower(file_ext(tx2gene)) == "gff") { + gffFile <-tx2gene } else { - gtfFile <- NULL - tx2gene <- read.table(tx2gene, header=FALSE) + gffFile <- NULL + tx2gene <- read.table(tx2gene, header=hasHeader) } useTXI <- TRUE } else { @@ -45,22 +45,29 @@ } else { # construct the object using tximport - # first need to make the tx2gene table - # this takes ~2-3 minutes using Bioconductor functions - if (!is.null(gtfFile)) { - suppressPackageStartupMessages({ - library("GenomicFeatures") - }) - txdb <- makeTxDbFromGFF(gtfFile, format="gtf") - k <- keys(txdb, keytype = "GENEID") - df <- select(txdb, keys = k, keytype = "GENEID", columns = "TXNAME") - tx2gene <- df[, 2:1] # tx ID, then gene ID - } library("tximport") txiFiles <- as.character(sampleTable$filename) labs <- row.names(sampleTable) names(txiFiles) <- labs - txi <- tximport(txiFiles, type=txtype, tx2gene=tx2gene) + if (!is.null(gffFile)) { + # first need to make the tx2gene table + # this takes ~2-3 minutes using Bioconductor functions + suppressPackageStartupMessages({ + library("GenomicFeatures") + }) + txdb <- makeTxDbFromGFF(gffFile) + k <- keys(txdb, keytype = "TXNAME") + tx2gene <- select(txdb, keys=k, columns="GENEID", keytype="TXNAME") + # Remove 'transcript:' from transcript IDs (when gffFile is a GFF3 from Ensembl and the transcript does not have a Name) + tx2gene$TXNAME <- sub('^transcript:', '', tx2gene$TXNAME) + } + try(txi <- tximport(txiFiles, type=txtype, tx2gene=tx2gene)) + if (!exists("txi")) { + # Remove version from transcript IDs in tx2gene... + tx2gene$TXNAME <- sub('\\.[0-9]+$', '', tx2gene$TXNAME) + # ...and in txiFiles + txi <- tximport(txiFiles, type=txtype, tx2gene=tx2gene, ignoreTxVersion=TRUE) + } dds <- DESeqDataSetFromTximport(txi, subset(sampleTable, select=-c(filename)), designFormula) diff -r 61dffb20b6f9 -r c24765926774 ruvseq.R --- a/ruvseq.R Wed Sep 05 15:54:16 2018 -0400 +++ b/ruvseq.R Tue Mar 26 06:25:38 2019 -0400 @@ -83,9 +83,12 @@ create_seq_expression_set <- function (dds, min_mean_count) { count_values <- counts(dds) + print(paste0("feature count before filtering :",nrow(count_values),"\n")) + print(paste0("Filtering features which mean expression is less or eq. than ", min_mean_count, " counts\n")) filter <- apply(count_values, 1, function(x) mean(x) > min_mean_count) filtered <- count_values[filter,] - set = newSeqExpressionSet(as.matrix(count_values), + print(paste0("feature count after filtering :",nrow(filtered),"\n")) + set = newSeqExpressionSet(as.matrix(filtered), phenoData = data.frame(colData(dds)$condition, row.names=colnames(filtered))) plot_pca_rle(set = set, title = 'raw data') set <- betweenLaneNormalization(set, which="upper") @@ -103,7 +106,7 @@ fit <- glmFit(y, design) lrt <- glmLRT(fit, coef=2) top <- topTags(lrt, n=nrow(set))$table - top_rows <- rownames(top)[which(top$PValue > cutoff_p)] + top_rows <- rownames(top)[which(top$PValue < cutoff_p)] empirical <- rownames(set)[which(!(rownames(set) %in% top_rows))] return(empirical) } @@ -154,6 +157,7 @@ alpha <- opt$alpha min_k <- opt$min_k max_k <- opt$max_k +min_c <- opt$min_mean_count sample_json <- fromJSON(opt$sample_json) sample_paths <- sample_json$path sample_names <- sample_json$label @@ -169,7 +173,7 @@ } # Run through the ruvseq variants -set <- create_seq_expression_set(dds, min_mean_count = opt$min_mean_count) +set <- create_seq_expression_set(dds, min_mean_count = min_c) result <- list(no_correction = set) for (k in seq(min_k, max_k)) { result[[paste0('residual_method_k', k)]] <- ruv_residual_method(set, k=k) diff -r 61dffb20b6f9 -r c24765926774 ruvseq.xml --- a/ruvseq.xml Wed Sep 05 15:54:16 2018 -0400 +++ b/ruvseq.xml Tue Mar 26 06:25:38 2019 -0400 @@ -1,11 +1,12 @@ - + from RNA-seq data - bioconductor-ruvseq - bioconductor-deseq2 - bioconductor-tximport - bioconductor-genomicfeatures - r-ggrepel + bioconductor-ruvseq + bioconductor-deseq2 + bioconductor-tximport + bioconductor-genomicfeatures + r-ggrepel + r-getopt - + - + - + @@ -170,19 +172,19 @@ - + - + - + @@ -202,24 +204,25 @@ + - + - + - + diff -r 61dffb20b6f9 -r c24765926774 test-data/ruvseq_diag_sailfish.pdf Binary file test-data/ruvseq_diag_sailfish.pdf has changed