Chapter 19 Interactive Interfaces and Sharing
19.1 Motivation
Exploratory data analysis (EDA) and visualization are crucial for many aspects of data analysis, such as quality control, hypothesis generation, and contextual result interpretation. Modern high-throughput technologies have been used to generate biological datasets of increasing size and complexity, including single-cell genomics and multi-omics experiments. As a consequence, the need for flexible and interactive platforms to explore those data from various angles has contributed to the increasing popularity of interactive graphical user interfaces (GUI).
In this chapter, we illustrate how the Bioconductor package iSEE
can be used to perform some common exploratory tasks during single-cell analysis workflows.
We note that these are examples only; in practice, EDA is often context-dependent and driven by distinct motivations and hypotheses for every new data set.
To this end, iSEE
provides a flexible framework immediately compatible with a wide range of genomics data modalities and offers tools for building custom interactive interfaces configured to draw special attention to key aspects of individual data sets.
19.2 Quick start
In this chapter, we use the 10X PBMC dataset for demonstration purposes.
#--- setup ---#
library(OSCAUtils)
chapterPreamble(use_cache = TRUE)
#--- loading ---#
library(BiocFileCache)
bfc <- BiocFileCache("raw_data", ask = FALSE)
raw.path <- bfcrpath(bfc, file.path("http://cf.10xgenomics.com/samples",
"cell-exp/2.1.0/pbmc4k/pbmc4k_raw_gene_bc_matrices.tar.gz"))
untar(raw.path, exdir=file.path(tempdir(), "pbmc4k"))
library(DropletUtils)
fname <- file.path(tempdir(), "pbmc4k/raw_gene_bc_matrices/GRCh38")
sce.pbmc <- read10xCounts(fname, col.names=TRUE)
#--- gene-annotation ---#
library(scater)
rownames(sce.pbmc) <- uniquifyFeatureNames(
rowData(sce.pbmc)$ID, rowData(sce.pbmc)$Symbol)
library(EnsDb.Hsapiens.v86)
location <- mapIds(EnsDb.Hsapiens.v86, keys=rowData(sce.pbmc)$ID,
column="SEQNAME", keytype="GENEID")
#--- cell-detection ---#
set.seed(100)
e.out <- emptyDrops(counts(sce.pbmc))
sce.pbmc <- sce.pbmc[,which(e.out$FDR <= 0.001)]
#--- quality-control ---#
stats <- perCellQCMetrics(sce.pbmc, subsets=list(Mito=which(location=="MT")))
high.mito <- isOutlier(stats$subsets_Mito_percent, type="higher")
sce.pbmc <- sce.pbmc[,!high.mito]
#--- normalization ---#
library(scran)
set.seed(1000)
clusters <- quickCluster(sce.pbmc)
sce.pbmc <- computeSumFactors(sce.pbmc, cluster=clusters)
sce.pbmc <- logNormCounts(sce.pbmc)
#--- variance-modelling ---#
set.seed(1001)
dec.pbmc <- modelGeneVarByPoisson(sce.pbmc)
top.pbmc <- getTopHVGs(dec.pbmc, prop=0.1)
#--- dimensionality-reduction ---#
set.seed(10000)
sce.pbmc <- denoisePCA(sce.pbmc, subset.row=top.pbmc, technical=dec.pbmc)
set.seed(100000)
sce.pbmc <- runTSNE(sce.pbmc, use_dimred="PCA")
set.seed(1000000)
sce.pbmc <- runUMAP(sce.pbmc, use_dimred="PCA")
Then, we load the iSEE
and shiny
packages for use in the rest of this chapter.
An instance of an interactive iSEE
application can be launched with any data set that is stored in an object of the SummarizedExperiment
class (or any class that extends it; e.g., SingleCellExperiment
, DESeqDataSet
, MethylSet
).
In its simplest form, this is done simply by calling iSEE(sce)
with the sce
data object as the sole argument, as illustrated here with the sce.pbmc
data set.
The default interface contains up to eight standard panels, each featuring a particular aspect of the data set. The names of standard panels are available in the panelTypes
variable.
The shorter panel codes are useful for the configuration of tours, described in the section Dissemination of analysis results.
## redDimPlot colDataPlot featAssayPlot
## "Reduced dimension plot" "Column data plot" "Feature assay plot"
## rowStatTable rowDataPlot sampAssayPlot
## "Row statistics table" "Row data plot" "Sample assay plot"
## colStatTable customDataPlot customStatTable
## "Column statistics table" "Custom data plot" "Custom statistics table"
## heatMapPlot
## "Heat map"
The layout of panels in the interface may be altered interactively: panels can be added, removed, resized or repositioned using the “Organize panels” menu in the top right corner of the interface. The initial layout in which an application is launched can also be altered programmatically (see section Examples of usages for iSEE apps).
To familiarize themselves with the GUI, users can launch an interactive tour from the menu in the top right corner. In addition, custom tours can be written to substitute the default built-in tour. This feature is particularly useful to disseminate new data sets with accompanying bespoke explanations guiding users through the salient features of any given data set (see section Dissemination of analysis results).
It is also possible to deploy “empty” instances of iSEE
apps, where any SummarizedExperiment
object stored in an RDS file may be uploaded to the running application.
Once the file is uploaded, the application will import the sce
object and initialize the GUI panels with the contents of the object for interactive exploration.
This type of iSEE
applications is launched without specifying the sce
argument, as shown below.
19.3 Examples of usages for iSEE apps
In the following subsections, we demonstrate some examples of use cases that can be addressed with interactive iSEE
applications to gain insights into a data set and inform decision-making for downstream analyses.
In each case, we demonstrate how iSEE
applications can be preconfigured to launch in a specific state and layout, to facilitate immediate exploration of key aspects of the data set, in a manner most directly relevant to each situation and objectives.
We note that these are examples only; we encourage readers to consider them as templates for learning and developing applications adapted to their own specific needs and purposes.
19.3.1 Quality control
Having previously computed quality control metrics (see section Quick start above), it is possible to launch an app instance that immediately displays a set of panels preconfigured to focus on selected quality control metrics.
For example, a common view for initial inspection of single-cell RNA-sequencing data sets generated by microfluidics platforms plots the library size of each cell, in decreasing order. An elbow in this plot generally reveals the transition between good quality cells and low quality cells or empty droplets.
Another common view is to overlay the library size (in log space) as a color gradient over the result of a dimensionality reduction method such as t-SNE, UMAP, or ivis (Van der Maaten and Hinton 2008; McInnes, Healy, and Melville 2018; Szubert et al. 2019). This view may reveal trajectories or clusters associated with library size. Depending on the experimental context, such observations may result from technical or biological phenomena. For instance, it may indicate that normalization was not completely effective at removing cell biases. Alternatively, it could also indicate the presence of multiple cell types or states that differ in total RNA content.
In the example below, we demonstrate that an iSEE
app can be preconfigured to immediately display the views described above, after compute the apropriate per-cell quality control metrics.
In addition, we also preconfigure the app to automatically highlight in the Reduced dimension plot panel any point selection made in the Column data plot panel. For instance, select the cells with either large or small library sizes, to inspect their distribution in reduced dimensions.
# Compute per-cell QC metrics
library(scater)
sce.pbmc <- addPerCellQC(sce.pbmc, exprs_values="counts")
# Compute the log10-transformed library size
colData(sce.pbmc)[["log10_total_counts"]] <- log10(sce.pbmc$total)
# Compute the rank of each cell by decreasing library size
colData(sce.pbmc)[["total_counts_rank"]] <- rank(-sce.pbmc$total)
# Configure a "Column data plot" panel
colDataArgs <- colDataPlotDefaults(sce.pbmc, 1)
colDataArgs$YAxis <- "log10_total_counts"
colDataArgs$XAxis <- "Column data"
colDataArgs$XAxisColData <- "total_counts_rank"
colDataArgs$DataBoxOpen <- TRUE
# Configure a "Reduced dimension plot " panel
redDimArgs <- redDimPlotDefaults(sce.pbmc, 1)
redDimArgs$DataBoxOpen <- TRUE
redDimArgs$Type <- "TSNE"
redDimArgs$VisualBoxOpen <- TRUE
redDimArgs$ColorBy <- "Column data"
redDimArgs$ColorByColData <- "log10_total_counts"
redDimArgs$SelectBoxOpen <- TRUE
redDimArgs$SelectByPlot <- "Column data plot 1"
# Configure the set of panels initially visible
initialPanels <- DataFrame(
Name = c("Column data plot 1", "Reduced dimension plot 1"),
Width = c(6L, 6L)
)
# Prepare the app
app <- iSEE(sce.pbmc, colDataArgs = colDataArgs, redDimArgs = redDimArgs,
initialPanels = initialPanels)
The preconfigured Shiny app can then be launched as show below.
Note that preconfigured apps remain fully interactive, meaning that users can interactively control the settings and layout of the panels. For instance, users may select cells in one panel and choose to highlight the selected cells in the other using the “Selection parameters” collapsible box, in an effort to determine an adequate threshold on the library size to filter cells to retain for downstream analyses.
Users may also change the information displayed in any panel. For instance, users may choose to color data points by percentage of UMI mapped to mitochondrial genes (“pct_counts_Mito”) in the “Reduced dimension plot 1”. Using the transfer of point selection between panels, users could select cells with small library sizes (in the “Column data plot 1” panel) and highlight them in the “Reduced dimension plot 1” panel, to investigate a possible relation between library size, clustering, and proportion of reads mapped to mitochondrial genes.
19.3.2 Annotation and identification of cell populations
Recent efforts such as the Human Cell Atlas (HCA) and the NIH Human BioMolecular Atlas Program (HuBMAP) have initiated the creation of reference maps of cell types and states in the human body in health and disease.
A direct challenge of such efforts lies in the curation and annotation of known and novel cell types in those data sets, and the subsequent transfer learning of those annotations to new data sets.
This generally relies on unsupervised clustering of cells, followed by pairwise differential expression to identify gene markers for each cluster, and ultimately manual curation of gene signatures to assign a cell type identity - either known or novel - to each cluster.
In this example, we show how iSEE
can be used to interactively examine results from unsupervised clustering and pairwise differential expression to conveniently inspect and determine cell identities.
In this first code chunk, we identify clusters of cells and their respective positive markers; that is, genes expressed at higher levels in the cluster of interest relative to other cells.
The clusters are identified as densely connected subgraphs within a k-nearest-neighbors graph of cells based on their expression profiles.
Positive markers are identified as genes with a significantly higher mean expression in each cluster using pairwise Welch t-tests.
Finally, the log-transformed false discovery rate (FDR) of each marker for each cluster is stored in the rowData
component of the sce
object, which will then be accessible in the iSEE
application.
library(scran)
# Compute cell clusters
g <- buildSNNGraph(sce.pbmc, k=10, use.dimred = 'PCA')
clust <- igraph::cluster_walktrap(g)$membership
sce.pbmc$cluster <- factor(clust)
# Identify positive markers for each cluster
markers.pbmc.up <- findMarkers(
sce.pbmc, sce.pbmc$cluster, direction="up", log.p=TRUE, sorted=FALSE)
# Collate the log-transformed FDR for each marker in a single table
x <- DataFrame(
row.names = rownames(sce.pbmc),
lapply(X = markers.pbmc.up, FUN = "[[", i="log.FDR")
)
colnames(x) <- as.character(seq_len(ncol(x)))
# Store the table of results as row metadata
rowData(sce.pbmc)[[(paste0("log.FDR", ".markers.up"))]] <- x
The next code chunk preconfigures an app that shows: - A table of feature statistics (including the log-transformed FDR of cluster markers computed above), - A plot showing the distribution of expression values for a chosen gene in each cluster, - A plot showing the result of the UMAP dimensionality reduction method overlaid with the expression value of a chosen gene.
Moreover, the code chunk below preconfigures the second and third panel to use the gene (i.e., row) selected in the first panel. This functionality is particularly convenient once the table is sorted by decreasing significance of the markers for a cluster of interest.
For instance, once the application is launched, users can sort the table by ascending value of “log.FDR.markers.up:1” (where “:1” indicates cluster “1”). Then, users may select the first row in the “Row statistics table 1” and watch the second and third panel automatically update to display the most significant marker gene on the y-axis (“Feature assay plot 1” panel), and as a color scale overlaid on the data points (“Reduced dimension plot 1” panel).
# Configure a "Row statistics table" panel (NOTE: we leave all parameters to default)
rowStatArgs <- rowStatTableDefaults(sce.pbmc, 1)
# Configure a "Feature assay plot" panel
featAssayArgs <- featAssayPlotDefaults(sce.pbmc, 1)
featAssayArgs$YAxisRowTable <- "Row statistics table 1"
featAssayArgs$XAxis <- "Column data"
featAssayArgs$XAxisColData <- "cluster"
featAssayArgs$DataBoxOpen <- TRUE
# Configure a "Reduced dimension plot" panel
redDimArgs <- redDimPlotDefaults(sce.pbmc, 1)
redDimArgs$Type <- "UMAP"
redDimArgs$ColorBy <- "Feature name"
redDimArgs$ColorByRowTable <- "Row statistics table 1"
redDimArgs$VisualBoxOpen <- TRUE
initialPanels <- DataFrame(
Name = c("Row statistics table 1", "Feature assay plot 1", "Reduced dimension plot 1"),
Width = c(4L, 4L, 4L)
)
# Prepare the app
app <- iSEE(sce.pbmc, rowStatArgs = rowStatArgs, featAssayArgs = featAssayArgs,
redDimArgs = redDimArgs, initialPanels = initialPanels)
The preconfigured Shiny app can then be launched as show below.
In addition to selecting the top genes by significance, it is also possible to search the table for arbitrary gene names and select any known marker gene. Practically, investigating the expression levels of known marker genes is a common approach to assigning cell type labels to inferred clusters.
19.3.3 Collaborative analysis
Analysis and interpretation of complex data sets typically involves multiple analysts. Rather than sharing analysis results via multiple versions of static figures and reports, it is often helpful to share interactive plots, allowing the other involved parties to independently explore the data in further depth.
With iSEE
, this can be achieved at any time by clicking on the “Display panel settings” button under the “Diagnostics” tab in the top right corner, copying the displayed code into an R script, and sharing this chunk of code with collaborators.
However, note that they will also need to have access to a copy of the original sce
object used for the analysis.
In that setup, executing the code in the script and launching iSEE
with the following command will open an instance mimicking what was shown on the screen when the “Display panel settings” button was clicked.
It is important to note that all iSEE
applications remain interactive irrespective of their initial configuration; only the initial layout and state of the application is modified from the default setup.
19.3.4 Dissemination of analysis results
Typically, results from single-cell data are disseminated as static figures in published papers and reports. However, those are unlikely to capture the full information content of a complex scRNA-seq data set, and do not allow further exploration by the reader. To complement this traditional mode of dissemination, it is becoming increasingly common to deploy web-based interactive data browsers, allowing interested readers to investigate their respective hypotheses using the published data.
With iSEE
, such interactive exploration is facilitated by small guided tours - step-by-step walkthroughs of the different panels with pointers to facilitate their interpretation, from the authors’ point of view.
At any time, the viewer is still free to leave the interactive tour and explore the data from their own perspective.
All that is needed to add a tour to an iSEE
instance is a data frame with two columns named “element” and “intro”; the first column declares the UI element to highlight in each step of the tour, and the second one contains the text to display at that step.
This data frame must then be provided to the iSEE()
function via the tour
argument.
In the code chunk below, we demonstrate the implementation of a simple tour taking through the two panels that compose a GUI, and interactively train users to use the collapsible boxes.
tour <- data.frame(
element = c(
"#Welcome",
"#redDimPlot1",
"#colDataPlot1",
"#colDataPlot1_DataBoxOpen",
"#Conclusion"),
intro = c(
"Welcome to this tour!",
"This is a <i>Reduced dimension plot.</i>",
"And this is a <i>Column data plot.</i>",
"<b>Action:</b> Click on this collapsible box to open and close it.",
"Thank you for taking this tour!"),
stringsAsFactors = FALSE)
initialPanels <- DataFrame(
Name = c("Reduced dimension plot 1", "Column data plot 1"),
Width = c(6L, 6L)
)
The preconfigured Shiny app can then be loaded with the tour and launched as show below.
Examples of advanced tours showcasing a selection of published data sets can be found at https://github.com/LTLA/iSEE2018.
19.4 Reproducibility
Although this chapter focuses on interactive exploration of data, we note that it is important to retain reproducibility, and to have a record of how each figure was generated.
With iSEE
, this information is readily available via the “Extract the R code” button under the “Diagnostics” dropdown menu in the top-right corner of the GUI.
At any time, copying the code displayed in the modal window and executing it in the R session from which the iSEE
app was launched exactly reproduces all plots currently displayed in the GUI.
Similarly, the information on the panel setup is available by clicking on “Display panel settings” in the same dropdown menu.
Another modal window will show the R code the user would need to use to reproduce the exact panel settings; copying and pasting this into an R script or a RMarkdown report will store this information in a permanent way.
19.5 Additional resources
For demontration and inspiration, we refer readers to the following examples of deployed applications:
- Use cases accompanying the published article: https://marionilab.cruk.cam.ac.uk/ (source code: https://github.com/LTLA/iSEE2018)
- Examples of
iSEE
in production: http://www.teichlab.org/singlecell-treg - Other examples as source code:
- Gallery of examples notebooks to reproduce analyses on public data: https://github.com/federicomarini/iSEE_instances
- Gallery of example custom panels: https://github.com/kevinrue/iSEE_custom
Session Info
R version 3.6.1 (2019-07-05)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.5 LTS
Matrix products: default
BLAS: /home/ramezqui/Rbuild/danbuild/R-3.6.1/lib/libRblas.so
LAPACK: /home/ramezqui/Rbuild/danbuild/R-3.6.1/lib/libRlapack.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8
[4] LC_COLLATE=C LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] scran_1.14.3 scater_1.14.3 ggplot2_3.2.1
[4] shiny_1.4.0 iSEE_1.6.0 SingleCellExperiment_1.8.0
[7] SummarizedExperiment_1.16.0 DelayedArray_0.12.0 BiocParallel_1.20.0
[10] matrixStats_0.55.0 Biobase_2.46.0 GenomicRanges_1.38.0
[13] GenomeInfoDb_1.22.0 IRanges_2.20.0 S4Vectors_0.24.0
[16] BiocGenerics_0.32.0 Cairo_1.5-10 BiocStyle_2.14.0
[19] OSCAUtils_0.0.1
loaded via a namespace (and not attached):
[1] nlme_3.1-142 bitops_1.0-6 bit64_0.9-7
[4] tools_3.6.1 backports_1.1.5 irlba_2.3.3
[7] R6_2.4.1 DT_0.10 vipor_0.4.5
[10] DBI_1.0.0 lazyeval_0.2.2 mgcv_1.8-30
[13] colorspace_1.4-1 withr_2.1.2 gridExtra_2.3
[16] tidyselect_0.2.5 bit_1.1-14 compiler_3.6.1
[19] BiocNeighbors_1.4.0 shinyjs_1.0 colourpicker_1.0
[22] bookdown_0.15 scales_1.0.0 stringr_1.4.0
[25] digest_0.6.22 rmarkdown_1.17 rentrez_1.2.2
[28] XVector_0.26.0 pkgconfig_2.0.3 htmltools_0.4.0
[31] limma_3.42.0 fastmap_1.0.1 htmlwidgets_1.5.1
[34] rlang_0.4.1 RSQLite_2.1.2 DelayedMatrixStats_1.8.0
[37] jsonlite_1.6 dplyr_0.8.3 BiocSingular_1.2.0
[40] RCurl_1.95-4.12 magrittr_1.5 GenomeInfoDbData_1.2.2
[43] Matrix_1.2-17 ggbeeswarm_0.6.0 Rcpp_1.0.3
[46] munsell_0.5.0 viridis_0.5.1 edgeR_3.28.0
[49] stringi_1.4.3 yaml_2.2.0 rintrojs_0.2.2
[52] zlibbioc_1.32.0 plyr_1.8.4 grid_3.6.1
[55] blob_1.2.0 dqrng_0.2.1 promises_1.1.0
[58] shinydashboard_0.7.1 crayon_1.3.4 miniUI_0.1.1.1
[61] lattice_0.20-38 cowplot_1.0.0 splines_3.6.1
[64] locfit_1.5-9.1 zeallot_0.1.0 knitr_1.26
[67] pillar_1.4.2 igraph_1.2.4.1 reshape2_1.4.3
[70] XML_3.98-1.20 glue_1.3.1 evaluate_0.14
[73] BiocManager_1.30.9 vctrs_0.2.0 httpuv_1.5.2
[76] gtable_0.3.0 purrr_0.3.3 assertthat_0.2.1
[79] xfun_0.11 rsvd_1.0.2 mime_0.7
[82] xtable_1.8-4 later_1.0.0 viridisLite_0.3.0
[85] tibble_2.1.3 beeswarm_0.2.3 AnnotationDbi_1.48.0
[88] memoise_1.1.0 statmod_1.4.32 shinyAce_0.4.1
Bibliography
McInnes, Leland, John Healy, and James Melville. 2018. “UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction.” arXiv E-Prints, February, arXiv:1802.03426.
Szubert, B., J. E. Cole, C. Monaco, and I. Drozdov. 2019. “Structure-Preserving Visualisation of High Dimensional Single-Cell Datasets.” Journal Article. Sci Rep 9 (1):8914. https://doi.org/10.1038/s41598-019-45301-0.
Van der Maaten, L., and G. Hinton. 2008. “Visualizing Data Using T-SNE.” J. Mach. Learn. Res. 9 (2579-2605):85.