Chapter 19 Interactive Interfaces and Sharing

19.1 Motivation

Exploratory data analysis (EDA) and visualization are crucial for many aspects of data analysis, such as quality control, hypothesis generation, and contextual result interpretation. Modern high-throughput technologies have been used to generate biological datasets of increasing size and complexity, including single-cell genomics and multi-omics experiments. As a consequence, the need for flexible and interactive platforms to explore those data from various angles has contributed to the increasing popularity of interactive graphical user interfaces (GUI).

In this chapter, we illustrate how the Bioconductor package iSEE can be used to perform some common exploratory tasks during single-cell analysis workflows. We note that these are examples only; in practice, EDA is often context-dependent and driven by distinct motivations and hypotheses for every new data set. To this end, iSEE provides a flexible framework immediately compatible with a wide range of genomics data modalities and offers tools for building custom interactive interfaces configured to draw special attention to key aspects of individual data sets.

19.2 Quick start

In this chapter, we use the 10X PBMC dataset for demonstration purposes.

#--- setup ---#
library(OSCAUtils)
chapterPreamble(use_cache = TRUE)

#--- loading ---#
library(BiocFileCache)
bfc <- BiocFileCache("raw_data", ask = FALSE)
raw.path <- bfcrpath(bfc, file.path("http://cf.10xgenomics.com/samples",
    "cell-exp/2.1.0/pbmc4k/pbmc4k_raw_gene_bc_matrices.tar.gz"))
untar(raw.path, exdir=file.path(tempdir(), "pbmc4k"))

library(DropletUtils)
fname <- file.path(tempdir(), "pbmc4k/raw_gene_bc_matrices/GRCh38")
sce.pbmc <- read10xCounts(fname, col.names=TRUE)

#--- gene-annotation ---#
library(scater)
rownames(sce.pbmc) <- uniquifyFeatureNames(
    rowData(sce.pbmc)$ID, rowData(sce.pbmc)$Symbol)

library(EnsDb.Hsapiens.v86)
location <- mapIds(EnsDb.Hsapiens.v86, keys=rowData(sce.pbmc)$ID, 
    column="SEQNAME", keytype="GENEID")

#--- cell-detection ---#
set.seed(100)
e.out <- emptyDrops(counts(sce.pbmc))
sce.pbmc <- sce.pbmc[,which(e.out$FDR <= 0.001)]

#--- quality-control ---#
stats <- perCellQCMetrics(sce.pbmc, subsets=list(Mito=which(location=="MT")))
high.mito <- isOutlier(stats$subsets_Mito_percent, type="higher")
sce.pbmc <- sce.pbmc[,!high.mito]

#--- normalization ---#
library(scran)
set.seed(1000)
clusters <- quickCluster(sce.pbmc)
sce.pbmc <- computeSumFactors(sce.pbmc, cluster=clusters)
sce.pbmc <- logNormCounts(sce.pbmc)

#--- variance-modelling ---#
set.seed(1001)
dec.pbmc <- modelGeneVarByPoisson(sce.pbmc)
top.pbmc <- getTopHVGs(dec.pbmc, prop=0.1)

#--- dimensionality-reduction ---#
set.seed(10000)
sce.pbmc <- denoisePCA(sce.pbmc, subset.row=top.pbmc, technical=dec.pbmc)

set.seed(100000)
sce.pbmc <- runTSNE(sce.pbmc, use_dimred="PCA")

set.seed(1000000)
sce.pbmc <- runUMAP(sce.pbmc, use_dimred="PCA")

Then, we load the iSEE and shiny packages for use in the rest of this chapter.

An instance of an interactive iSEE application can be launched with any data set that is stored in an object of the SummarizedExperiment class (or any class that extends it; e.g., SingleCellExperiment, DESeqDataSet, MethylSet). In its simplest form, this is done simply by calling iSEE(sce) with the sce data object as the sole argument, as illustrated here with the sce.pbmc data set.

The default interface contains up to eight standard panels, each featuring a particular aspect of the data set. The names of standard panels are available in the panelTypes variable. The shorter panel codes are useful for the configuration of tours, described in the section Dissemination of analysis results.

##                redDimPlot               colDataPlot             featAssayPlot 
##  "Reduced dimension plot"        "Column data plot"      "Feature assay plot" 
##              rowStatTable               rowDataPlot             sampAssayPlot 
##    "Row statistics table"           "Row data plot"       "Sample assay plot" 
##              colStatTable            customDataPlot           customStatTable 
## "Column statistics table"        "Custom data plot" "Custom statistics table" 
##               heatMapPlot 
##                "Heat map"

The layout of panels in the interface may be altered interactively: panels can be added, removed, resized or repositioned using the “Organize panels” menu in the top right corner of the interface. The initial layout in which an application is launched can also be altered programmatically (see section Examples of usages for iSEE apps).

To familiarize themselves with the GUI, users can launch an interactive tour from the menu in the top right corner. In addition, custom tours can be written to substitute the default built-in tour. This feature is particularly useful to disseminate new data sets with accompanying bespoke explanations guiding users through the salient features of any given data set (see section Dissemination of analysis results).

It is also possible to deploy “empty” instances of iSEE apps, where any SummarizedExperiment object stored in an RDS file may be uploaded to the running application. Once the file is uploaded, the application will import the sce object and initialize the GUI panels with the contents of the object for interactive exploration. This type of iSEE applications is launched without specifying the sce argument, as shown below.

19.3 Examples of usages for iSEE apps

In the following subsections, we demonstrate some examples of use cases that can be addressed with interactive iSEE applications to gain insights into a data set and inform decision-making for downstream analyses. In each case, we demonstrate how iSEE applications can be preconfigured to launch in a specific state and layout, to facilitate immediate exploration of key aspects of the data set, in a manner most directly relevant to each situation and objectives.

We note that these are examples only; we encourage readers to consider them as templates for learning and developing applications adapted to their own specific needs and purposes.

19.3.1 Quality control

Having previously computed quality control metrics (see section Quick start above), it is possible to launch an app instance that immediately displays a set of panels preconfigured to focus on selected quality control metrics.

For example, a common view for initial inspection of single-cell RNA-sequencing data sets generated by microfluidics platforms plots the library size of each cell, in decreasing order. An elbow in this plot generally reveals the transition between good quality cells and low quality cells or empty droplets.

Another common view is to overlay the library size (in log space) as a color gradient over the result of a dimensionality reduction method such as t-SNE, UMAP, or ivis (Van der Maaten and Hinton 2008; McInnes, Healy, and Melville 2018; Szubert et al. 2019). This view may reveal trajectories or clusters associated with library size. Depending on the experimental context, such observations may result from technical or biological phenomena. For instance, it may indicate that normalization was not completely effective at removing cell biases. Alternatively, it could also indicate the presence of multiple cell types or states that differ in total RNA content.

In the example below, we demonstrate that an iSEE app can be preconfigured to immediately display the views described above, after compute the apropriate per-cell quality control metrics.

In addition, we also preconfigure the app to automatically highlight in the Reduced dimension plot panel any point selection made in the Column data plot panel. For instance, select the cells with either large or small library sizes, to inspect their distribution in reduced dimensions.

The preconfigured Shiny app can then be launched as show below.

Note that preconfigured apps remain fully interactive, meaning that users can interactively control the settings and layout of the panels. For instance, users may select cells in one panel and choose to highlight the selected cells in the other using the “Selection parameters” collapsible box, in an effort to determine an adequate threshold on the library size to filter cells to retain for downstream analyses.

Users may also change the information displayed in any panel. For instance, users may choose to color data points by percentage of UMI mapped to mitochondrial genes (“pct_counts_Mito”) in the “Reduced dimension plot 1”. Using the transfer of point selection between panels, users could select cells with small library sizes (in the “Column data plot 1” panel) and highlight them in the “Reduced dimension plot 1” panel, to investigate a possible relation between library size, clustering, and proportion of reads mapped to mitochondrial genes.

19.3.2 Annotation and identification of cell populations

Recent efforts such as the Human Cell Atlas (HCA) and the NIH Human BioMolecular Atlas Program (HuBMAP) have initiated the creation of reference maps of cell types and states in the human body in health and disease. A direct challenge of such efforts lies in the curation and annotation of known and novel cell types in those data sets, and the subsequent transfer learning of those annotations to new data sets. This generally relies on unsupervised clustering of cells, followed by pairwise differential expression to identify gene markers for each cluster, and ultimately manual curation of gene signatures to assign a cell type identity - either known or novel - to each cluster. In this example, we show how iSEE can be used to interactively examine results from unsupervised clustering and pairwise differential expression to conveniently inspect and determine cell identities.

In this first code chunk, we identify clusters of cells and their respective positive markers; that is, genes expressed at higher levels in the cluster of interest relative to other cells. The clusters are identified as densely connected subgraphs within a k-nearest-neighbors graph of cells based on their expression profiles. Positive markers are identified as genes with a significantly higher mean expression in each cluster using pairwise Welch t-tests. Finally, the log-transformed false discovery rate (FDR) of each marker for each cluster is stored in the rowData component of the sce object, which will then be accessible in the iSEE application.

The next code chunk preconfigures an app that shows: - A table of feature statistics (including the log-transformed FDR of cluster markers computed above), - A plot showing the distribution of expression values for a chosen gene in each cluster, - A plot showing the result of the UMAP dimensionality reduction method overlaid with the expression value of a chosen gene.

Moreover, the code chunk below preconfigures the second and third panel to use the gene (i.e., row) selected in the first panel. This functionality is particularly convenient once the table is sorted by decreasing significance of the markers for a cluster of interest.

For instance, once the application is launched, users can sort the table by ascending value of “log.FDR.markers.up:1” (where “:1” indicates cluster “1”). Then, users may select the first row in the “Row statistics table 1” and watch the second and third panel automatically update to display the most significant marker gene on the y-axis (“Feature assay plot 1” panel), and as a color scale overlaid on the data points (“Reduced dimension plot 1” panel).

The preconfigured Shiny app can then be launched as show below.

In addition to selecting the top genes by significance, it is also possible to search the table for arbitrary gene names and select any known marker gene. Practically, investigating the expression levels of known marker genes is a common approach to assigning cell type labels to inferred clusters.

19.3.3 Collaborative analysis

Analysis and interpretation of complex data sets typically involves multiple analysts. Rather than sharing analysis results via multiple versions of static figures and reports, it is often helpful to share interactive plots, allowing the other involved parties to independently explore the data in further depth.

With iSEE, this can be achieved at any time by clicking on the “Display panel settings” button under the “Diagnostics” tab in the top right corner, copying the displayed code into an R script, and sharing this chunk of code with collaborators. However, note that they will also need to have access to a copy of the original sce object used for the analysis. In that setup, executing the code in the script and launching iSEE with the following command will open an instance mimicking what was shown on the screen when the “Display panel settings” button was clicked.

It is important to note that all iSEE applications remain interactive irrespective of their initial configuration; only the initial layout and state of the application is modified from the default setup.

19.3.4 Dissemination of analysis results

Typically, results from single-cell data are disseminated as static figures in published papers and reports. However, those are unlikely to capture the full information content of a complex scRNA-seq data set, and do not allow further exploration by the reader. To complement this traditional mode of dissemination, it is becoming increasingly common to deploy web-based interactive data browsers, allowing interested readers to investigate their respective hypotheses using the published data.

With iSEE, such interactive exploration is facilitated by small guided tours - step-by-step walkthroughs of the different panels with pointers to facilitate their interpretation, from the authors’ point of view. At any time, the viewer is still free to leave the interactive tour and explore the data from their own perspective. All that is needed to add a tour to an iSEE instance is a data frame with two columns named “element” and “intro”; the first column declares the UI element to highlight in each step of the tour, and the second one contains the text to display at that step. This data frame must then be provided to the iSEE() function via the tour argument.

In the code chunk below, we demonstrate the implementation of a simple tour taking through the two panels that compose a GUI, and interactively train users to use the collapsible boxes.

The preconfigured Shiny app can then be loaded with the tour and launched as show below.

Examples of advanced tours showcasing a selection of published data sets can be found at https://github.com/LTLA/iSEE2018.

19.4 Reproducibility

Although this chapter focuses on interactive exploration of data, we note that it is important to retain reproducibility, and to have a record of how each figure was generated. With iSEE, this information is readily available via the “Extract the R code” button under the “Diagnostics” dropdown menu in the top-right corner of the GUI. At any time, copying the code displayed in the modal window and executing it in the R session from which the iSEE app was launched exactly reproduces all plots currently displayed in the GUI. Similarly, the information on the panel setup is available by clicking on “Display panel settings” in the same dropdown menu. Another modal window will show the R code the user would need to use to reproduce the exact panel settings; copying and pasting this into an R script or a RMarkdown report will store this information in a permanent way.

19.5 Additional resources

For demontration and inspiration, we refer readers to the following examples of deployed applications:

Session Info

R version 3.6.1 (2019-07-05)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.5 LTS

Matrix products: default
BLAS:   /home/ramezqui/Rbuild/danbuild/R-3.6.1/lib/libRblas.so
LAPACK: /home/ramezqui/Rbuild/danbuild/R-3.6.1/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8       
 [4] LC_COLLATE=C               LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
[10] LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] scran_1.14.3                scater_1.14.3               ggplot2_3.2.1              
 [4] shiny_1.4.0                 iSEE_1.6.0                  SingleCellExperiment_1.8.0 
 [7] SummarizedExperiment_1.16.0 DelayedArray_0.12.0         BiocParallel_1.20.0        
[10] matrixStats_0.55.0          Biobase_2.46.0              GenomicRanges_1.38.0       
[13] GenomeInfoDb_1.22.0         IRanges_2.20.0              S4Vectors_0.24.0           
[16] BiocGenerics_0.32.0         Cairo_1.5-10                BiocStyle_2.14.0           
[19] OSCAUtils_0.0.1            

loaded via a namespace (and not attached):
 [1] nlme_3.1-142             bitops_1.0-6             bit64_0.9-7             
 [4] tools_3.6.1              backports_1.1.5          irlba_2.3.3             
 [7] R6_2.4.1                 DT_0.10                  vipor_0.4.5             
[10] DBI_1.0.0                lazyeval_0.2.2           mgcv_1.8-30             
[13] colorspace_1.4-1         withr_2.1.2              gridExtra_2.3           
[16] tidyselect_0.2.5         bit_1.1-14               compiler_3.6.1          
[19] BiocNeighbors_1.4.0      shinyjs_1.0              colourpicker_1.0        
[22] bookdown_0.15            scales_1.0.0             stringr_1.4.0           
[25] digest_0.6.22            rmarkdown_1.17           rentrez_1.2.2           
[28] XVector_0.26.0           pkgconfig_2.0.3          htmltools_0.4.0         
[31] limma_3.42.0             fastmap_1.0.1            htmlwidgets_1.5.1       
[34] rlang_0.4.1              RSQLite_2.1.2            DelayedMatrixStats_1.8.0
[37] jsonlite_1.6             dplyr_0.8.3              BiocSingular_1.2.0      
[40] RCurl_1.95-4.12          magrittr_1.5             GenomeInfoDbData_1.2.2  
[43] Matrix_1.2-17            ggbeeswarm_0.6.0         Rcpp_1.0.3              
[46] munsell_0.5.0            viridis_0.5.1            edgeR_3.28.0            
[49] stringi_1.4.3            yaml_2.2.0               rintrojs_0.2.2          
[52] zlibbioc_1.32.0          plyr_1.8.4               grid_3.6.1              
[55] blob_1.2.0               dqrng_0.2.1              promises_1.1.0          
[58] shinydashboard_0.7.1     crayon_1.3.4             miniUI_0.1.1.1          
[61] lattice_0.20-38          cowplot_1.0.0            splines_3.6.1           
[64] locfit_1.5-9.1           zeallot_0.1.0            knitr_1.26              
[67] pillar_1.4.2             igraph_1.2.4.1           reshape2_1.4.3          
[70] XML_3.98-1.20            glue_1.3.1               evaluate_0.14           
[73] BiocManager_1.30.9       vctrs_0.2.0              httpuv_1.5.2            
[76] gtable_0.3.0             purrr_0.3.3              assertthat_0.2.1        
[79] xfun_0.11                rsvd_1.0.2               mime_0.7                
[82] xtable_1.8-4             later_1.0.0              viridisLite_0.3.0       
[85] tibble_2.1.3             beeswarm_0.2.3           AnnotationDbi_1.48.0    
[88] memoise_1.1.0            statmod_1.4.32           shinyAce_0.4.1          

Bibliography

McInnes, Leland, John Healy, and James Melville. 2018. “UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction.” arXiv E-Prints, February, arXiv:1802.03426.

Szubert, B., J. E. Cole, C. Monaco, and I. Drozdov. 2019. “Structure-Preserving Visualisation of High Dimensional Single-Cell Datasets.” Journal Article. Sci Rep 9 (1):8914. https://doi.org/10.1038/s41598-019-45301-0.

Van der Maaten, L., and G. Hinton. 2008. “Visualizing Data Using T-SNE.” J. Mach. Learn. Res. 9 (2579-2605):85.