# HG changeset patch # User iuc # Date 1631190279 0 # Node ID 2b09ca1c5e415ce0192be98c19a3375debe54fbb # Parent a30f4bfe8f019bb5c9ba568d5e1bd3fbd90878c7 "planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/scater commit 154318f74839a4481c7c68993c4fb745842c4cce" diff -r a30f4bfe8f01 -r 2b09ca1c5e41 README.md --- a/README.md Tue Sep 03 14:30:21 2019 -0400 +++ b/README.md Thu Sep 09 12:24:39 2021 +0000 @@ -1,20 +1,20 @@ # Wrappers for Scater -This code wraps a number of [scater](https://bioconductor.org/packages/release/bioc/html/scater.html) functions as Galaxy wrappers. Briefly, the `scater-create-qcmetric-ready-sce` tool takes a sample gene expression matrix (usually read-counts) and a cell annotation file, creates a [SingleCellExperiment](https://bioconductor.org/packages/release/bioc/html/SingleCellExperiment.html) object and runs scater's `calculateQCMetrics` function (using other supplied files such as ERCC's and mitochondrial gene features). +This code wraps a number of [scater](https://bioconductor.org/packages/release/bioc/html/scater.html) and [scuttle](https://bioconductor.org/packages/3.13/bioc/html/scuttle.html) functions as Galaxy wrappers. Briefly, the `scater-create-qcmetric-ready-sce` tool takes a sample gene expression matrix (usually read-counts) and a cell annotation file, creates a [SingleCellExperiment](https://bioconductor.org/packages/release/bioc/html/SingleCellExperiment.html) object and runs scater's `calculateQCMetrics` function (using other supplied files such as ERCC's and mitochondrial gene features). Various filter scripts are provided, along with some plotting functions for QC. ## Typical workflow 1. Read in data with `scater-create-qcmetric-ready-sce`. -2. Visualise it.\ +2. Visualise it. Take a look at the distribution of library sizes, expressed features and mitochondrial genes with `scater-plot-dist-scatter`. - Then look at the distibution of genes across cells with `scater-plot-exprs-freq`. + 3. Guided by the plots, filter the data with `scater-filter`.\ You can either manually filter with user-defined parameters or use PCA to automatically removes outliers. 4. Visualise data again to see how the filtering performed using `scater-plot-dist-scatter`.\ Decide if you're happy with the data. If not, try increasing or decreasing the filtering parameters. -5. Normalise data with `scater-normalize`. + 6. Investigate other confounding factors.\ Plot the data (using PCA) and display various annotated properties of the cells using `scater-plot-pca`. @@ -51,10 +51,6 @@ --- -`scater-plot-exprs-freq.R` -Plots mean expression vs % of expressing cells and provides information as to the number of genes expressed in 50% and 25% of cells. - ---- `scater-pca-filter.R` Takes SingleCellExperiment object (from Loom file) and automatically removes outliers from data using PCA. Save the filtered SingleCellExperiment object in Loom format. @@ -74,18 +70,18 @@ --- -`scater-normalize.R` -Compute log-normalized expression values from count data in a SingleCellExperiment object, using the size factors stored in the object. Save the normalised SingleCellExperiment object in Loom format. +`scater-plot-pca.R` +PCA plot of a SingleCellExperiment object. The options `-c`, `-p`, and `-s` all refer to cell annotation features. These are the column headers of the `-c` option used in `scater-create-qcmetric-ready-sce.R`. ``` -./scater-normalize.R -i test-data/scater_manual_filtered.loom -o test-data/scater_man_filtered_normalised.loom +./scater-plot-pca.R -i test-data/scater_qcready.loom -c Treatment -p Mutation_Status -o test-data/scater_pca_plot.pdf ``` --- -`scater-plot-pca.R` -PCA plot of a normalised SingleCellExperiment object (produced with `scater-normalize.R`). The options `-c`, `-p`, and `-s` all refer to cell annotation features. These are the column headers of the `-c` option used in `scater-create-qcmetric-ready-sce.R`. +`scater-plot-tsne.R` +t-SNE plot of a SingleCellExperiment object. The options `-c`, `-p`, and `-s` all refer to cell annotation features. These are the column headers of the `-c` option used in `scater-create-qcmetric-ready-sce.R`. ``` -./scater-plot-pca.R -i test-data/scater_man_filtered_normalised.loom -c Treatment -p Mutation_Status -o test-data/scater_pca_plot.pdf +./scater-plot-tsne.R -i test-data/scater_qcready.loom -c Treatment -p Mutation_Status -o test-data/scater_tsne_plot.pdf ``` diff -r a30f4bfe8f01 -r 2b09ca1c5e41 environment.yml --- a/environment.yml Tue Sep 03 14:30:21 2019 -0400 +++ b/environment.yml Thu Sep 09 12:24:39 2021 +0000 @@ -4,11 +4,11 @@ - bioconda - defaults dependencies: - - bioconductor-loomexperiment=1.2.0 - - bioconductor-scater=1.12.2 - - r-ggpubr=0.2.2 - - r-mvoutlier=2.0.9 - - r-optparse=1.6.2 + - bioconductor-loomexperiment=1.10.1 + - bioconductor-scater=1.20.0 + - r-ggpubr=0.4.0 + - r-optparse=1.6.6 + - r-robustbase=0.93_8 - r-rtsne=0.15 - - r-scales=1.0.0 - - r-workflowscriptscommon=0.0.4 + - r-scales=1.1.1 + - r-workflowscriptscommon=0.0.7 diff -r a30f4bfe8f01 -r 2b09ca1c5e41 macros.xml --- a/macros.xml Tue Sep 03 14:30:21 2019 -0400 +++ b/macros.xml Thu Sep 09 12:24:39 2021 +0000 @@ -1,14 +1,20 @@ - 1.12.2 + 1.20.0 + 20.01 bioconductor-scater - r-optparse - r-workflowscriptscommon - bioconductor-loomexperiment + r-optparse + r-workflowscriptscommon + bioconductor-loomexperiment + + + scater + + 10.1093/bioinformatics/btw777 diff -r a30f4bfe8f01 -r 2b09ca1c5e41 scater-create-qcmetric-ready-sce.R --- a/scater-create-qcmetric-ready-sce.R Tue Sep 03 14:30:21 2019 -0400 +++ b/scater-create-qcmetric-ready-sce.R Thu Sep 09 12:24:39 2021 +0000 @@ -8,67 +8,60 @@ # parse options #SCE-specific options -option_list = list( +option_list <- list( make_option( c("-a", "--counts"), action = "store", default = NA, - type = 'character', + type = "character", help = "A tab-delimited expression matrix. The first column of all files is assumed to be feature names and the first row is assumed to be sample names." ), make_option( c("-r", "--row-data"), action = "store", default = NULL, - type = 'character', + type = "character", help = "Path to TSV (tab-delimited) format file describing the features. Row names from the expression matrix (-a), if present, become the row names of the SingleCellExperiment." ), make_option( c("-c", "--col-data"), action = "store", default = NULL, - type = 'character', + type = "character", help = "Path to TSV format file describing the samples (annotation). The number of rows (samples) must equal the number of columns in the expression matrix." ), #The scater-specific options make_option( - c("--assay-name"), - action = "store", - default = 'counts', - type = 'character', - help= "String specifying the name of the 'assay' of the 'object' that should be used to define expression." - ), - make_option( c("-f", "--mt-controls"), action = "store", default = NULL, - type = 'character', + type = "character", help = "Path to file containing a list of the mitochondrial control genes" ), make_option( c("-p", "--ercc-controls"), action = "store", default = NULL, - type = 'character', + type = "character", help = "Path to file containing a list of the ERCC controls" ), make_option( c("-l", "--cell-controls"), action = "store", default = NULL, - type = 'character', + type = "character", help = "Path to file (one cell per line) to be used to derive a vector of cell (sample) names used to identify cell controls (for example, blank wells or bulk controls)." ), make_option( c("-o", "--output-loom"), action = "store", default = NA, - type = 'character', + type = "character", help = "File name in which to store the SingleCellExperiment object in Loom format." ) ) -opt <- wsc_parse_args(option_list, mandatory = c('counts', 'output_loom')) +opt <- wsc_parse_args(option_list, mandatory = c("counts", "output_loom")) # Read the expression matrix @@ -79,64 +72,61 @@ rowdata <- opt$row_data -if ( ! is.null(opt$row_data) ){ +if (! is.null(opt$row_data)) { rowdata <- read.delim(opt$row_data) } coldata <- opt$col_data -if ( ! is.null(opt$col_data) ){ +if (! is.null(opt$col_data)) { coldata <- read.delim(opt$col_data) } # Now build the object -assays <- list(as.matrix(reads)) -names(assays) <- c(opt$assay_name) -scle <- SingleCellLoomExperiment(assays = assays, colData = coldata, rowData = rowdata) -# Define spikes (if supplied) - +sce <- SingleCellLoomExperiment(assays = list(counts = as.matrix(reads)), colData = coldata) #Scater options # Check feature_controls (only mitochondrial and ERCC used for now) -feature_controls_list = list() -if (! is.null(opt$mt_controls) && opt$mt_controls != 'NULL'){ - if (! file.exists(opt$mt_controls)){ - stop((paste('Supplied feature_controls file', opt$mt_controls, 'does not exist'))) + +if (! is.null(opt$mt_controls)) { + if (! file.exists(opt$mt_controls)) { + stop((paste("Supplied feature_controls file", opt$mt_controls, "does not exist"))) } else { - mt_controls <- readLines(opt$mt_controls) - feature_controls_list[["MT"]] <- mt_controls + mts <- readLines(opt$mt_controls) } +} else { + mts <- NULL } -if (! is.null(opt$ercc_controls) && opt$ercc_controls != 'NULL'){ - if (! file.exists(opt$ercc_controls)){ - stop((paste('Supplied feature_controls file', opt$ercc_controls, 'does not exist'))) +if (! is.null(opt$ercc_controls)) { + if (! file.exists(opt$ercc_controls)) { + stop((paste("Supplied feature_controls file", opt$ercc_controls, "does not exist"))) } else { ercc_controls <- readLines(opt$ercc_controls) - feature_controls_list[["ERCC"]] <- ercc_controls } } else { - ercc_controls <- character() + ercc_controls <- NULL } # Check cell_controls -cell_controls_list <- list() -if (! is.null(opt$cell_controls) && opt$cell_controls != 'NULL'){ - if (! file.exists(opt$cell_controls)){ - stop((paste('Supplied feature_controls file', opt$cell_controls, 'does not exist'))) + +if (! is.null(opt$cell_controls)) { + if (! file.exists(opt$cell_controls)) { + stop((paste("Supplied feature_controls file", opt$cell_controls, "does not exist"))) } else { cell_controls <- readLines(opt$cell_controls) - cell_controls_list[["empty"]] <- cell_controls } +} else { + cell_controls <- NULL } +# calculate QCMs -# calculate QCMs -scle <- calculateQCMetrics(scle, exprs_values = opt$assay_name, feature_controls = feature_controls_list, cell_controls = cell_controls_list) +sce <- addPerCellQC(sce, subsets = list(Mito = mts, ERCC = ercc_controls, cell_controls = cell_controls)) # Output to a Loom file if (file.exists(opt$output_loom)) { file.remove(opt$output_loom) } -export(scle, opt$output_loom, format='loom') +export(sce, opt$output_loom, format = "loom") diff -r a30f4bfe8f01 -r 2b09ca1c5e41 scater-manual-filter.R --- a/scater-manual-filter.R Tue Sep 03 14:30:21 2019 -0400 +++ b/scater-manual-filter.R Thu Sep 09 12:24:39 2021 +0000 @@ -8,92 +8,94 @@ library(scater) # parse options -option_list = list( +option_list <- list( make_option( c("-i", "--input-loom"), action = "store", default = NA, - type = 'character', - help = "A SingleCellExperiment object file in Loom format." + type = "character", + help = "A SingleCellExperiment object file in Loom format" + ), + make_option( + c("-l", "--library-size"), + action = "store", + default = 0, + type = "numeric", + help = "Minimum library size (mapped reads) to filter cells on" + ), + make_option( + c("-m", "--percent-counts-MT"), + action = "store", + default = 100, + type = "numeric", + help = "Maximum % of mitochondrial genes expressed per cell. Cells that exceed this value will be filtered out" + ), + make_option( + c("-f", "--expressed-features"), + action = "store", + default = 100, + type = "numeric", + help = "Remove cells that have less than the given number of expressed features" ), make_option( c("-d", "--detection-limit"), action = "store", default = 0, - type = 'numeric', - help = "Numeric scalar providing the value above which observations are deemed to be expressed" - ), - make_option( - c("-l", "--library-size"), - action = "store", - default = 0, - type = 'numeric', - help = "Minimum library size (mapped reads) to filter cells on" + type = "numeric", + help = "Number of reads mapped to a feature above which it to be deemed as expressed" ), make_option( - c("-e", "--expressed-genes"), + c("-e", "--min-cells-expressed"), action = "store", default = 0, - type = 'numeric', - help = "Minimum number of expressed genes to filter cells on" - ), - make_option( - c("-m", "--percent-counts-MT"), - action = "store", - default = 100, - type = 'numeric', - help = "Maximum % of mitochondrial genes expressed per cell. Cells that exceed this value will be filtered out." + type = "numeric", + help = "Remove features that occur in less than the given number of cells" ), make_option( c("-o", "--output-loom"), action = "store", default = NA, - type = 'character', - help = "File name in which to store the SingleCellExperiment object in Loom format." + type = "character", + help = "File name in which to store the SingleCellExperiment object in Loom format" ) ) -opt <- wsc_parse_args(option_list, mandatory = c('input_loom', 'output_loom')) +opt <- wsc_parse_args(option_list, mandatory = c("input_loom", "output_loom")) # Check parameter values -if ( ! file.exists(opt$input_loom)){ - stop((paste('File', opt$input_loom, 'does not exist'))) +if (! file.exists(opt$input_loom)) { + stop((paste("File", opt$input_loom, "does not exist"))) } # Filter out unexpressed features -scle <- import(opt$input_loom, format='loom', type='SingleCellLoomExperiment') -print(paste("Starting with", ncol(scle), "cells and", nrow(scle), "features.")) +sce <- import(opt$input_loom, format = "loom", type = "SingleCellLoomExperiment") +print(paste("Starting with", ncol(sce), "cells and", nrow(sce), "features.")) -# Create a logical vector of features that are expressed (above detection_limit) -feature_expressed <- nexprs(scle, detection_limit = opt$detection_limit, exprs_values = 1, byrow=TRUE) > 0 -scle <- scle[feature_expressed, ] - -print(paste("After filtering out unexpressed features: ", ncol(scle), "cells and", nrow(scle), "features.")) +# Filter out low quality cells # Filter low library sizes -to_keep <- scle$total_counts > opt$library_size -scle <- scle[, to_keep] - -print(paste("After filtering out low library counts: ", ncol(scle), "cells and", nrow(scle), "features.")) - - -# Filter low expressed genes -to_keep <- scle$total_features_by_counts > opt$expressed_genes -scle <- scle[, to_keep] - -print(paste("After filtering out low expressed: ", ncol(scle), "cells and", nrow(scle), "features.")) - +passing_total <- sce$total > opt$library_size +sce <- sce[, passing_total] +print(paste("After filtering out low library counts: ", ncol(sce), "cells and", nrow(sce), "features.")) # Filter out high MT counts -to_keep <- scle$pct_counts_MT < opt$percent_counts_MT -scle <- scle[, to_keep] +passing_mt_counts <- sce$subsets_Mito_percent < opt$percent_counts_MT +sce <- sce[, passing_mt_counts] +print(paste("After filtering out high MT gene counts: ", ncol(sce), "cells and", nrow(sce), "features.")) -print(paste("After filtering out high MT gene counts: ", ncol(scle), "cells and", nrow(scle), "features.")) +expr_features <- sce$detected > opt$expressed_features +sce <- sce[, expr_features] +print(paste("After filtering out cells with low feature counts: ", ncol(sce), "cells and", nrow(sce), "features.")) + +# Create a logical vector of features that are expressed (above detection_limit) +feature_expressed <- nexprs(sce, detection_limit = opt$detection_limit, byrow = TRUE) > opt$min_cells_expressed +sce <- sce[feature_expressed, ] +print(paste("After filtering out rare features: ", ncol(sce), "cells and", nrow(sce), "features.")) # Output to a Loom file if (file.exists(opt$output_loom)) { file.remove(opt$output_loom) } -export(scle, opt$output_loom, format='loom') +export(sce, opt$output_loom, format = "loom") diff -r a30f4bfe8f01 -r 2b09ca1c5e41 scater-normalize.R --- a/scater-normalize.R Tue Sep 03 14:30:21 2019 -0400 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,50 +0,0 @@ -#!/usr/bin/env Rscript -#Normalises a SingleCellExperiment object - -# Load optparse we need to check inputs -library(optparse) -library(workflowscriptscommon) -library(LoomExperiment) -library(scater) - -# parse options -option_list = list( - make_option( - c("-i", "--input-loom"), - action = "store", - default = NA, - type = 'character', - help = "A SingleCellExperiment object file in Loom format." - ), - make_option( - c("-o", "--output-loom"), - action = "store", - default = NA, - type = 'character', - help = "File name in which to store the SingleCellExperiment object in Loom format." - ) -) - -opt <- wsc_parse_args(option_list, mandatory = c('input_loom', 'output_loom')) - -# Check parameter values - -if ( ! file.exists(opt$input_loom)){ - stop((paste('File', opt$input_loom, 'does not exist'))) -} - -# Input from Loom format - -scle <- import(opt$input_loom, format='loom', type='SingleCellLoomExperiment') -print(paste("Normalising....")) - -#Normalise -scle <- normalize(scle, exprs_values = 1) - -print(paste("Finished normalising")) - -# Output to a Loom file -if (file.exists(opt$output_loom)) { - file.remove(opt$output_loom) -} -export(scle, opt$output_loom, format='loom') diff -r a30f4bfe8f01 -r 2b09ca1c5e41 scater-pca-filter.R --- a/scater-pca-filter.R Tue Sep 03 14:30:21 2019 -0400 +++ b/scater-pca-filter.R Thu Sep 09 12:24:39 2021 +0000 @@ -12,49 +12,48 @@ library(workflowscriptscommon) library(LoomExperiment) library(scater) -library(mvoutlier) +library(robustbase) # parse options -option_list = list( +option_list <- list( make_option( c("-i", "--input-loom"), action = "store", default = NA, - type = 'character', + type = "character", help = "A SingleCellExperiment object file in Loom format." ), make_option( c("-o", "--output-loom"), action = "store", default = NA, - type = 'character', + type = "character", help = "File name in which to store the SingleCellExperiment object in Loom format." ) ) -opt <- wsc_parse_args(option_list, mandatory = c('input_loom', 'output_loom')) +opt <- wsc_parse_args(option_list, mandatory = c("input_loom", "output_loom")) # Check parameter values -if ( ! file.exists(opt$input_loom)){ - stop((paste('File', opt$input_loom, 'does not exist'))) +if (! file.exists(opt$input_loom)) { + stop((paste("File", opt$input_loom, "does not exist"))) } -# Input from Loom format +# Filter out unexpressed features -scle <- import(opt$input_loom, format='loom', type='SingleCellLoomExperiment') -print(paste("Starting with", ncol(scle), "cells and", nrow(scle), "features.")) +sce <- import(opt$input_loom, format = "loom", type = "SingleCellLoomExperiment") + +print(paste("Starting with", ncol(sce), "cells")) -# Run PCA on data and detect outliers -scle <- runPCA(scle, use_coldata = TRUE, detect_outliers = TRUE) +sce <- runColDataPCA(sce, outliers = TRUE, variables = list("sum", "detected", "subsets_Mito_percent")) +sce$use <- !sce$outlier +sce <- sce[, colData(sce)$use] +print(paste("Ending with", ncol(sce), "cells")) -# Filter out outliers -scle <- scle[, !scle$outlier] - -print(paste("Ending with", ncol(scle), "cells and", nrow(scle), "features.")) # Output to a Loom file if (file.exists(opt$output_loom)) { file.remove(opt$output_loom) } -export(scle, opt$output_loom, format='loom') +export(sce, opt$output_loom, format = "loom") diff -r a30f4bfe8f01 -r 2b09ca1c5e41 scater-plot-dist-scatter.R --- a/scater-plot-dist-scatter.R Tue Sep 03 14:30:21 2019 -0400 +++ b/scater-plot-dist-scatter.R Thu Sep 09 12:24:39 2021 +0000 @@ -13,45 +13,45 @@ # parse options -option_list = list( +option_list <- list( make_option( c("-i", "--input-loom"), action = "store", default = NA, - type = 'character', + type = "character", help = "A SingleCellExperiment object file in Loom format." ), make_option( c("-o", "--output-plot-file"), action = "store", default = NA, - type = 'character', + type = "character", help = "Path of the PDF output file to save plot to." ), make_option( c("-l", "--log-scale"), - action="store_true", - default=FALSE, - type = 'logical', + action = "store_true", + default = FALSE, + type = "logical", help = "Plot on log scale (recommended for large datasets)." ) ) -opt <- wsc_parse_args(option_list, mandatory = c('input_loom', 'output_plot_file', 'log_scale')) +opt <- wsc_parse_args(option_list, mandatory = c("input_loom", "output_plot_file")) # Check parameter values -if ( ! file.exists(opt$input_loom)){ - stop((paste('File', opt$input_loom, 'does not exist'))) +if (! file.exists(opt$input_loom)) { + stop((paste("File", opt$input_loom, "does not exist"))) } -# Input from Loom format +# Filter out unexpressed features -scle <- import(opt$input_loom, format='loom', type='SingleCellLoomExperiment') +sce <- import(opt$input_loom, format = "loom", type = "SingleCellLoomExperiment") -#do the scatter plot of reads vs genes -total_counts <- scle$total_counts -total_features <- scle$total_features_by_counts +# Do the scatter plot of reads vs genes +total_counts <- sce$total +total_features <- sce$detected count_feats <- cbind(total_counts, total_features) cf_dm <- as.data.frame(count_feats) @@ -59,21 +59,20 @@ read_bins <- max(total_counts / 1e6) / 20 feat_bins <- max(total_features) / 20 -# Make the plots -plot <- ggplot(cf_dm, aes(x=total_counts / 1e6, y=total_features)) + geom_point(shape=1) + geom_smooth() + xlab("Read count (millions)") + - ylab("Feature count") + ggtitle("Scatterplot of reads vs features") -plot1 <- qplot(total_counts / 1e6, geom="histogram", binwidth = read_bins, ylab="Number of cells", xlab = "Read counts (millions)", fill=I("darkseagreen3")) + ggtitle("Read counts per cell") -plot2 <- qplot(total_features, geom="histogram", binwidth = feat_bins, ylab="Number of cells", xlab = "Feature counts", fill=I("darkseagreen3")) + ggtitle("Feature counts per cell") -plot3 <- plotColData(scle, y="pct_counts_MT", x="total_features_by_counts") + ggtitle("% MT genes") + geom_point(shape=1) + theme(text = element_text(size=15)) + theme(plot.title = element_text(size=15)) +plot1 <- qplot(total_counts / 1e6, geom = "histogram", binwidth = read_bins, ylab = "Number of cells", xlab = "Read counts (millions)", fill = I("darkseagreen3")) + ggtitle("Read counts per cell") +plot2 <- qplot(total_features, geom = "histogram", binwidth = feat_bins, ylab = "Number of cells", xlab = "Feature counts", fill = I("darkseagreen3")) + ggtitle("Feature counts per cell") +plot3 <- ggplot(cf_dm, aes(x = total_counts / 1e6, y = total_features)) + geom_point(shape = 1) + geom_smooth() + xlab("Read count (millions)") + + ylab("Feature count") + ggtitle("Scatterplot of reads vs features") +plot4 <- plotColData(sce, y = "subsets_Mito_percent", x = "detected") + ggtitle("% MT genes") + geom_point(shape = 1) + theme(text = element_text(size = 15)) + theme(plot.title = element_text(size = 15)) + xlab("Total features") + ylab("% MT") -if (! opt$log_scale){ - final_plot <- ggarrange(plot1, plot2, plot, plot3, ncol=2, nrow=2) - ggsave(opt$output_plot_file, final_plot, device="pdf") +if (! opt$log_scale) { + final_plot <- ggarrange(plot1, plot2, plot3, plot4, ncol = 2, nrow = 2) + ggsave(opt$output_plot_file, final_plot, device = "pdf") } else { - plot_log_both <- plot + scale_x_continuous(trans = 'log10') + scale_y_continuous(trans = 'log10') - plot1_log <- plot1 + scale_y_continuous(trans = 'log10') - plot2_log <- plot2 + scale_y_continuous(trans = 'log10') - plot3_log <- plot3 + scale_y_log10(labels=number) - final_plot_log <- ggarrange(plot1_log, plot2_log, plot_log_both, plot3_log, ncol=2, nrow=2) - ggsave(opt$output_plot_file, final_plot_log, device="pdf") + plot1_log <- plot1 + scale_x_continuous(trans = "log10") + scale_y_continuous(trans = "log10") + plot2_log <- plot2 + scale_y_continuous(trans = "log10") + plot3_log <- plot3 + scale_y_continuous(trans = "log10") + plot4_log <- plot4 + scale_y_log10(labels = number) + final_plot_log <- ggarrange(plot1_log, plot2_log, plot3_log, plot4_log, ncol = 2, nrow = 2) + ggsave(opt$output_plot_file, final_plot_log, device = "pdf") } diff -r a30f4bfe8f01 -r 2b09ca1c5e41 scater-plot-exprs-freq.R --- a/scater-plot-exprs-freq.R Tue Sep 03 14:30:21 2019 -0400 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,45 +0,0 @@ -#!/usr/bin/env Rscript - -#Plots mean expression vs % of expressing cells and provides information as to the number of genes expressed in 50% and 25% of cells. -# Load optparse we need to check inputs - -library(optparse) -library(workflowscriptscommon) -library(LoomExperiment) -library(scater) - -# parse options - -option_list = list( - make_option( - c("-i", "--input-loom"), - action = "store", - default = NA, - type = 'character', - help = "A SingleCellExperiment object file in Loom format." - ), - make_option( - c("-o", "--output-plot-file"), - action = "store", - default = NA, - type = 'character', - help = "Path of the PDF output file to save plot to." - ) -) - -opt <- wsc_parse_args(option_list, mandatory = c('input_loom', 'output_plot_file')) - -# Check parameter values - -if ( ! file.exists(opt$input_loom)){ - stop((paste('File', opt$input_loom, 'does not exist'))) -} - - -# Input from Loom format - -scle <- import(opt$input_loom, format='loom', type='SingleCellLoomExperiment') - -#produce and save the scatter plot of reads vs genes -plot <- plotExprsFreqVsMean(scle, controls = "is_feature_control_MT") -ggsave(opt$output_plot_file, plot, device="pdf") diff -r a30f4bfe8f01 -r 2b09ca1c5e41 scater-plot-pca.R --- a/scater-plot-pca.R Tue Sep 03 14:30:21 2019 -0400 +++ b/scater-plot-pca.R Thu Sep 09 12:24:39 2021 +0000 @@ -11,58 +11,58 @@ # parse options -option_list = list( +option_list <- list( make_option( c("-i", "--input-loom"), action = "store", default = NA, - type = 'character', + type = "character", help = "A SingleCellExperiment object file in Loom format." ), make_option( c("-c", "--colour-by"), action = "store", default = NULL, - type = 'character', + type = "character", help = "Feature (from annotation file) to colour PCA plot points by. The values represented in this options should be categorical" ), make_option( c("-s", "--size-by"), action = "store", default = NULL, - type = 'character', + type = "character", help = "Feature (from annotation file) to size PCA plot points by. The values represented in this options should be numerical and not categorical" ), make_option( c("-p", "--shape-by"), action = "store", default = NULL, - type = 'character', + type = "character", help = "Feature (from annotation file) to shape PCA plot points by. The values represented in this options should be categorical" ), make_option( c("-o", "--output-plot-file"), action = "store", default = NA, - type = 'character', + type = "character", help = "Path of the PDF output file to save plot to." ) ) -opt <- wsc_parse_args(option_list, mandatory = c('input_loom', 'output_plot_file')) +opt <- wsc_parse_args(option_list, mandatory = c("input_loom", "output_plot_file")) + # Check parameter values -if ( ! file.exists(opt$input_loom)){ - stop((paste('File', opt$input_loom, 'does not exist'))) +if (! file.exists(opt$input_loom)) { + stop((paste("File", opt$input_loom, "does not exist"))) } - -# Input from Loom format +# Filter out unexpressed features -scle <- import(opt$input_loom, format='loom', type='SingleCellLoomExperiment') -scle <- normalize(scle, exprs_values = 1) -scle <- runPCA(scle) -plot <- plotReducedDim(scle, "PCA", colour_by = opt$colour_by, size_by = opt$size_by, shape_by = opt$shape_by) -#do the scatter plot of reads vs genes +sce <- import(opt$input_loom, format = "loom", type = "SingleCellLoomExperiment") +sce <- logNormCounts(sce) +sce <- runPCA(sce) -ggsave(opt$output_plot_file, plot, device="pdf") +plot <- plotReducedDim(sce, dimred = "PCA", colour_by = opt$colour_by, size_by = opt$size_by, shape_by = opt$shape_by) + +ggsave(opt$output_plot_file, plot, device = "pdf") diff -r a30f4bfe8f01 -r 2b09ca1c5e41 scater-plot-tsne.R --- a/scater-plot-tsne.R Tue Sep 03 14:30:21 2019 -0400 +++ b/scater-plot-tsne.R Thu Sep 09 12:24:39 2021 +0000 @@ -12,58 +12,57 @@ # parse options -option_list = list( +option_list <- list( make_option( c("-i", "--input-loom"), action = "store", default = NA, - type = 'character', + type = "character", help = "A SingleCellExperiment object file in Loom format." ), make_option( c("-c", "--colour-by"), action = "store", default = NULL, - type = 'character', + type = "character", help = "Feature (from annotation file) to colour t-SNE plot points by. The values represented in this options should be categorical" ), make_option( c("-s", "--size-by"), action = "store", default = NULL, - type = 'character', + type = "character", help = "Feature (from annotation file) to size t-SNE plot points by. The values represented in this options should be numerical and not categorical" ), make_option( c("-p", "--shape-by"), action = "store", default = NULL, - type = 'character', + type = "character", help = "Feature (from annotation file) to shape t-SNE plot points by. The values represented in this options should be categorical" ), make_option( c("-o", "--output-plot-file"), action = "store", default = NA, - type = 'character', + type = "character", help = "Path of the PDF output file to save plot to." ) ) -opt <- wsc_parse_args(option_list, mandatory = c('input_loom', 'output_plot_file')) +opt <- wsc_parse_args(option_list, mandatory = c("input_loom", "output_plot_file")) + # Check parameter values -if ( ! file.exists(opt$input_loom)){ - stop((paste('File', opt$input_loom, 'does not exist'))) +if (! file.exists(opt$input_loom)) { + stop((paste("File", opt$input_loom, "does not exist"))) } - -# Input from Loom format +# Filter out unexpressed features -scle <- import(opt$input_loom, format='loom', type='SingleCellLoomExperiment') -scle <- normalize(scle, exprs_values = 1) -scle <- runTSNE(scle, perplexity=10) -plot <- plotTSNE(scle, colour_by = opt$colour_by, size_by = opt$size_by, shape_by = opt$shape_by) +sce <- import(opt$input_loom, format = "loom", type = "SingleCellLoomExperiment") +sce <- logNormCounts(sce) +sce <- runTSNE(sce, perplexity = 10) +plot <- plotTSNE(sce, colour_by = opt$colour_by, size_by = opt$size_by, shape_by = opt$shape_by) - -ggsave(opt$output_plot_file, plot, device="pdf") +ggsave(opt$output_plot_file, plot, device = "pdf") diff -r a30f4bfe8f01 -r 2b09ca1c5e41 scater-plot-tsne.xml --- a/scater-plot-tsne.xml Tue Sep 03 14:30:21 2019 -0400 +++ b/scater-plot-tsne.xml Thu Sep 09 12:24:39 2021 +0000 @@ -1,5 +1,6 @@ - - Produces a t-SNE plot of two components + + of two components + macros.xml diff -r a30f4bfe8f01 -r 2b09ca1c5e41 test-data/scater_exprs_freq.pdf Binary file test-data/scater_exprs_freq.pdf has changed diff -r a30f4bfe8f01 -r 2b09ca1c5e41 test-data/scater_filtered_normalised.loom Binary file test-data/scater_filtered_normalised.loom has changed diff -r a30f4bfe8f01 -r 2b09ca1c5e41 test-data/scater_manual_filtered.loom Binary file test-data/scater_manual_filtered.loom has changed diff -r a30f4bfe8f01 -r 2b09ca1c5e41 test-data/scater_pca_filtered.loom Binary file test-data/scater_pca_filtered.loom has changed diff -r a30f4bfe8f01 -r 2b09ca1c5e41 test-data/scater_pca_plot.pdf Binary file test-data/scater_pca_plot.pdf has changed diff -r a30f4bfe8f01 -r 2b09ca1c5e41 test-data/scater_qcready.loom Binary file test-data/scater_qcready.loom has changed diff -r a30f4bfe8f01 -r 2b09ca1c5e41 test-data/scater_reads_genes_dist.pdf Binary file test-data/scater_reads_genes_dist.pdf has changed diff -r a30f4bfe8f01 -r 2b09ca1c5e41 test-data/scater_reads_genes_dist_log.pdf Binary file test-data/scater_reads_genes_dist_log.pdf has changed diff -r a30f4bfe8f01 -r 2b09ca1c5e41 test-data/scater_tsne_plot.pdf Binary file test-data/scater_tsne_plot.pdf has changed