diff gss.Rmd @ 0:d0cbe6cc1f04 draft default tip

"planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/genomic_super_signature commit 1aadd5dce3b254e7714c2fdd39413029fd4b9b7a"
author iuc
date Wed, 12 Jan 2022 19:07:45 +0000
parents
children
line wrap: on
line diff
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/gss.Rmd	Wed Jan 12 19:07:45 2022 +0000
@@ -0,0 +1,122 @@
+---
+title: "Analysis by GenomicSuperSignature"
+date: "`r Sys.Date()`"
+output:
+  BiocStyle::html_document:
+    toc: true
+    toc_float: false
+    toc_depth: 3
+params: 
+    val_all: val_all
+    dat: dat
+    RAVmodel: RAVmodel
+    inputName: inputName
+    numOut: numOut
+---
+
+```{r setup, include=FALSE}
+knitr::opts_chunk$set(echo = FALSE)
+```
+
+# RAVs best represents your dataset
+The *validation* provides a quantitative representation of the relevance
+between your dataset and RAVs. Below shows the top 6 validated RAVs and 
+the complete result is saved as `{input_name}_validate.csv`. 
+
+```{r}
+head(params$val_all)
+```
+
+## Heatmap Table
+`heatmapTable` takes validation results as its input and displays them into 
+a two panel table: the top panel shows the average silhouette width (avg.sw) 
+and the bottom panel displays the validation score.
+
+`heatmapTable` can display different subsets of the validation output. For 
+example, if you specify `scoreCutoff`, any validation result above that score 
+will be shown. If you specify the number (n) of top validation results through 
+`num.out`, the output will be a n-columned heatmap table. You can also use the 
+average silhouette width (`swCutoff`), the size of cluster (`clsizecutoff`), 
+one of the top 8 PCs from the dataset (`whichPC`).
+
+Here, we print out top `r params$numOut` validated RAVs with average silhouette 
+width above 0.
+
+```{r out.height="45%", out.width="45%", message=FALSE, warning=FALSE}
+heatmapTable(params$val_all, num.out = params$numOut, swCutoff = 0)
+```
+
+## Interactive Graph
+Under the default condition, `plotValidate` plots validation results of all non 
+single-element RAVs in one graph, where x-axis represents average silhouette 
+width of the RAVs (a quality control measure of RAVs) and y-axis represents 
+validation score. We recommend users to focus on RAVs with higher validation 
+score and use average silhouette width as a secondary criteria. 
+
+```{r out.height="80%", out.width="80%"}
+plotValidate(params$val_all, interactive = TRUE)
+```
+
+Note that `interactive = TRUE` will result in a zoomable, interactive plot that
+included tooltips, which is saved as `{input_name}_validate_plot.html` file. 
+
+You can hover each data point for more information:    
+
+- **sw** : the average silhouette width of the cluster   
+- **score** : the top validation score between 8 PCs of the dataset and RAVs   
+- **cl_size** : the size of RAVs, represented by the dot size   
+- **cl_num** : the RAV number. You need this index to find more information 
+about the RAV.      
+- **PC** : test dataset's PC number that validates the given RAV. Because we 
+used top 8 PCs of the test dataset for validation, there are 8 categories. 
+
+If you double-click the PC legend on the right, you will enter an 
+individual display mode where you can add an additional group of data 
+point by single-click. 
+
+
+# Prior information associated to your dataset
+```{r echo=FALSE}
+validated_ind <- validatedSignatures(params$val_all, num.out = params$numOut,
+                                     swCutoff = 0, indexOnly = TRUE)
+
+# In case, there are fewer validated_ind than the number of outputs user set
+n <- min(params$numOut, length(validated_ind), na.rm = TRUE)
+```
+
+## MeSH terms in wordcloud
+```{r out.height="60%", out.width="60%", fig.width=8, fig.height=8}
+for (i in seq_len(n)) {
+    set.seed(1)
+    print(paste0("MeSH terms related to RAV", validated_ind[i]))
+    drawWordcloud(params$RAVmodel, validated_ind[i])
+}
+```
+
+## GSEA
+The complete result is saved as `{input_name}_genesets_RAV*.csv`. 
+```{r}
+res_all <- vector(mode = "list", length = n)
+for (i in seq_len(n)) {
+    RAVnum <- validated_ind[i]
+    RAVname <- paste0("RAV", RAVnum)
+    res <- gsea(params$RAVmodel)[[RAVname]]
+    res_all[[i]] <- head(res)
+    names(res_all)[i] <- paste0("Enriched gene sets for RAV", validated_ind[i])
+}
+res_all
+```
+
+## Publication
+The complete result is saved as `{input_name}_literatures_RAV*.csv`. 
+```{r}
+res_all <- vector(mode = "list", length = n)
+for (i in seq_len(n)) {
+    RAVnum <- validated_ind[i]
+    res <- findStudiesInCluster(params$RAVmodel, RAVnum, studyTitle = TRUE)
+    res_all[[i]] <- head(res)
+    names(res_all)[i] <- paste0("Studies related to RAV", validated_ind[i])
+}
+res_all
+```
+