Chapter 28 Grun human pancreas (CEL-seq2)

28.1 Introduction

This workflow performs an analysis of the Grun et al. (2016) CEL-seq2 dataset consisting of human pancreas cells from various donors.

28.3 Quality control

This dataset lacks mitochondrial genes so we will do without them for quality control. We compute the median and MAD while blocking on the donor; for donors where the assumption of a majority of high-quality cells seems to be violated (Figure 28.1), we compute an appropriate threshold using the other donors as specified in the subset= argument.

Distribution of each QC metric across cells from each donor of the Grun pancreas dataset. Each point represents a cell and is colored according to whether that cell was discarded.

Figure 28.1: Distribution of each QC metric across cells from each donor of the Grun pancreas dataset. Each point represents a cell and is colored according to whether that cell was discarded.

##              low_lib_size            low_n_features high_altexps_ERCC_percent 
##                       450                       511                       606 
##                   discard 
##                       665

28.4 Normalization

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.093   0.505   0.795   1.000   1.222  12.088
Relationship between the library size factors and the deconvolution size factors in the Grun pancreas dataset.

Figure 28.2: Relationship between the library size factors and the deconvolution size factors in the Grun pancreas dataset.

28.5 Variance modelling

We block on a combined plate and donor factor.

We examine the number of cells in each level of the blocking factor.

## block
##                  CD13+ sorted cells_D17       CD24+ CD44+ live sorted cells_D17 
##                                      86                                      87 
##                  CD63+ sorted cells_D10                TGFBR3+ sorted cells_D17 
##                                      41                                      90 
## exocrine fraction, live sorted cells_D2 exocrine fraction, live sorted cells_D3 
##                                      82                                       7 
##        live sorted cells, library 1_D10        live sorted cells, library 1_D17 
##                                      33                                      88 
##         live sorted cells, library 1_D3         live sorted cells, library 1_D7 
##                                      24                                      85 
##        live sorted cells, library 2_D10        live sorted cells, library 2_D17 
##                                      35                                      83 
##         live sorted cells, library 2_D3         live sorted cells, library 2_D7 
##                                      27                                      84 
##         live sorted cells, library 3_D3         live sorted cells, library 3_D7 
##                                      16                                      83 
##         live sorted cells, library 4_D3         live sorted cells, library 4_D7 
##                                      29                                      83
Per-gene variance as a function of the mean for the log-expression values in the Grun pancreas dataset. Each point represents a gene (black) with the mean-variance trend (blue) fitted to the spike-in transcripts (red) separately for each donor.

Figure 23.4: Per-gene variance as a function of the mean for the log-expression values in the Grun pancreas dataset. Each point represents a gene (black) with the mean-variance trend (blue) fitted to the spike-in transcripts (red) separately for each donor.

28.6 Data integration

##           D10      D17       D2      D3      D7
## [1,] 0.030636 0.030848 0.000000 0.00000 0.00000
## [2,] 0.007173 0.011192 0.036693 0.00000 0.00000
## [3,] 0.003501 0.004942 0.006731 0.05007 0.00000
## [4,] 0.012104 0.014786 0.013833 0.01252 0.05253

28.8 Clustering

##        Donor
## Cluster D10 D17  D2  D3  D7
##       1  17  73   3   2  77
##       2   6  11   5   7   6
##       3  12 128   0   0  62
##       4  28 108  43  13 118
##       5  32  70  31  80  28
##       6   5  13   0   0  10
##       7   5  18   0   1  33
##       8   4  13   0   0   1
Obligatory $t$-SNE plots of the Grun pancreas dataset. Each point represents a cell that is colored by cluster (left) or batch (right).

Figure 28.3: Obligatory \(t\)-SNE plots of the Grun pancreas dataset. Each point represents a cell that is colored by cluster (left) or batch (right).

Session Info

R Under development (unstable) (2019-12-29 r77627)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.6 LTS

Matrix products: default
BLAS/LAPACK: /app/easybuild/software/OpenBLAS/0.2.18-GCC-5.4.0-2.26-LAPACK-3.6.1/lib/libopenblas_prescottp-r0.2.18.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8       
 [4] LC_COLLATE=C               LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
[10] LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] batchelor_1.3.8             scran_1.15.14               scater_1.15.12             
 [4] ggplot2_3.2.1               org.Hs.eg.db_3.10.0         AnnotationDbi_1.49.0       
 [7] scRNAseq_2.1.5              SingleCellExperiment_1.9.1  SummarizedExperiment_1.17.1
[10] DelayedArray_0.13.2         BiocParallel_1.21.2         matrixStats_0.55.0         
[13] Biobase_2.47.2              GenomicRanges_1.39.1        GenomeInfoDb_1.23.1        
[16] IRanges_2.21.2              S4Vectors_0.25.8            BiocGenerics_0.33.0        
[19] Cairo_1.5-10                BiocStyle_2.15.3            OSCAUtils_0.0.1            

loaded via a namespace (and not attached):
 [1] bitops_1.0-6                  bit64_0.9-7                   httr_1.4.1                   
 [4] tools_4.0.0                   backports_1.1.5               R6_2.4.1                     
 [7] irlba_2.3.3                   vipor_0.4.5                   DBI_1.1.0                    
[10] lazyeval_0.2.2                colorspace_1.4-1              withr_2.1.2                  
[13] gridExtra_2.3                 tidyselect_0.2.5              processx_3.4.1               
[16] bit_1.1-14                    curl_4.3                      compiler_4.0.0               
[19] BiocNeighbors_1.5.1           labeling_0.3                  bookdown_0.16                
[22] scales_1.1.0                  callr_3.4.0                   rappdirs_0.3.1               
[25] stringr_1.4.0                 digest_0.6.23                 rmarkdown_2.0                
[28] XVector_0.27.0                pkgconfig_2.0.3               htmltools_0.4.0              
[31] highr_0.8                     limma_3.43.0                  dbplyr_1.4.2                 
[34] fastmap_1.0.1                 rlang_0.4.2                   RSQLite_2.2.0                
[37] shiny_1.4.0                   DelayedMatrixStats_1.9.0      farver_2.0.1                 
[40] dplyr_0.8.3                   RCurl_1.95-4.12               magrittr_1.5                 
[43] BiocSingular_1.3.1            GenomeInfoDbData_1.2.2        Matrix_1.2-18                
[46] Rcpp_1.0.3                    ggbeeswarm_0.6.0              munsell_0.5.0                
[49] viridis_0.5.1                 lifecycle_0.1.0               edgeR_3.29.0                 
[52] stringi_1.4.3                 yaml_2.2.0                    zlibbioc_1.33.0              
[55] Rtsne_0.15                    BiocFileCache_1.11.4          AnnotationHub_2.19.3         
[58] grid_4.0.0                    blob_1.2.0                    dqrng_0.2.1                  
[61] promises_1.1.0                ExperimentHub_1.13.5          crayon_1.3.4                 
[64] lattice_0.20-38               cowplot_1.0.0                 locfit_1.5-9.1               
[67] zeallot_0.1.0                 knitr_1.26                    ps_1.3.0                     
[70] pillar_1.4.3                  igraph_1.2.4.2                codetools_0.2-16             
[73] glue_1.3.1                    BiocVersion_3.11.1            evaluate_0.14                
[76] BiocManager_1.30.10           vctrs_0.2.1                   httpuv_1.5.2                 
[79] gtable_0.3.0                  purrr_0.3.3                   assertthat_0.2.1             
[82] xfun_0.11                     rsvd_1.0.2                    mime_0.8                     
[85] xtable_1.8-4                  later_1.0.0                   viridisLite_0.3.0            
[88] tibble_2.1.3                  beeswarm_0.2.3                memoise_1.1.0                
[91] statmod_1.4.32                interactiveDisplayBase_1.25.0

Bibliography

Grun, D., M. J. Muraro, J. C. Boisset, K. Wiebrands, A. Lyubimova, G. Dharmadhikari, M. van den Born, et al. 2016. “De Novo Prediction of Stem Cell Identity using Single-Cell Transcriptome Data.” Cell Stem Cell 19 (2):266–77.