Chapter 31 Segerstolpe human pancreas (Smart-seq2)

31.1 Introduction

This performs an analysis of the Segerstolpe et al. (2016) dataset, consisting of human pancreas cells from various donors.

31.3 Quality control

We remove low quality cells that were marked by the authors. We then perform additional quality control as some of the remaining cells still have very low counts and numbers of detected features. For some batches that seem to have a majority of low-quality cells (Figure ??), we use the other batches to define an appropriate threshold via subset=.

Distribution of each QC metric across cells from each donor of the Segerstolpe pancreas dataset. Each point represents a cell and is colored according to whether that cell was discarded.

Figure 31.1: Distribution of each QC metric across cells from each donor of the Segerstolpe pancreas dataset. Each point represents a cell and is colored according to whether that cell was discarded.

##              low_lib_size            low_n_features high_altexps_ERCC_percent 
##                       788                      1056                      1031 
##                   discard 
##                      1246

31.3.1 Normalization

We don’t normalize the spike-ins as there are some cells with no spike-in counts.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.014   0.390   0.708   1.000   1.332  11.182
Relationship between the library size factors and the deconvolution size factors in the Segerstolpe pancreas dataset.

Figure 31.2: Relationship between the library size factors and the deconvolution size factors in the Segerstolpe pancreas dataset.

31.3.2 Variance modelling

We do not use cells with no spike-ins for variance modelling. Donor AZ also has very low spike-in counts and is subsequently ignored.

Per-gene variance as a function of the mean for the log-expression values in the Grun pancreas dataset. Each point represents a gene (black) with the mean-variance trend (blue) fitted to the spike-in transcripts (red) separately for each donor.

Figure 31.3: Per-gene variance as a function of the mean for the log-expression values in the Grun pancreas dataset. Each point represents a gene (black) with the mean-variance trend (blue) fitted to the spike-in transcripts (red) separately for each donor.

31.3.4 Clustering

We see a strong donor effect, which suggests that should have called fastMNN() at some point. (But hey, we already did that for the Muraro and Grun analyses, so where’s the fun in doing that again?)

Heatmap of the frequency of cells from each donor in each cluster.

Figure 31.4: Heatmap of the frequency of cells from each donor in each cluster.

Heatmap of the frequency of cells from each cell type label in each cluster.

Figure 31.5: Heatmap of the frequency of cells from each cell type label in each cluster.

Obligatory $t$-SNE plots of the Segerstolpe pancreas dataset. Each point represents a cell that is colored by cluster (left) or batch (right).

Figure 28.3: Obligatory \(t\)-SNE plots of the Segerstolpe pancreas dataset. Each point represents a cell that is colored by cluster (left) or batch (right).

Session Info

R Under development (unstable) (2019-12-29 r77627)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.6 LTS

Matrix products: default
BLAS/LAPACK: /app/easybuild/software/OpenBLAS/0.2.18-GCC-5.4.0-2.26-LAPACK-3.6.1/lib/libopenblas_prescottp-r0.2.18.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8       
 [4] LC_COLLATE=C               LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
[10] LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] pheatmap_1.0.12             BiocSingular_1.3.1          scran_1.15.14              
 [4] scater_1.15.12              ggplot2_3.2.1               ensembldb_2.11.2           
 [7] AnnotationFilter_1.11.0     GenomicFeatures_1.39.2      AnnotationDbi_1.49.0       
[10] AnnotationHub_2.19.3        BiocFileCache_1.11.4        dbplyr_1.4.2               
[13] scRNAseq_2.1.5              SingleCellExperiment_1.9.1  SummarizedExperiment_1.17.1
[16] DelayedArray_0.13.2         BiocParallel_1.21.2         matrixStats_0.55.0         
[19] Biobase_2.47.2              GenomicRanges_1.39.1        GenomeInfoDb_1.23.1        
[22] IRanges_2.21.2              S4Vectors_0.25.8            BiocGenerics_0.33.0        
[25] Cairo_1.5-10                BiocStyle_2.15.3            OSCAUtils_0.0.1            

loaded via a namespace (and not attached):
  [1] Rtsne_0.15                    ggbeeswarm_0.6.0              colorspace_1.4-1             
  [4] XVector_0.27.0                BiocNeighbors_1.5.1           farver_2.0.1                 
  [7] bit64_0.9-7                   interactiveDisplayBase_1.25.0 codetools_0.2-16             
 [10] knitr_1.26                    zeallot_0.1.0                 Rsamtools_2.3.2              
 [13] shiny_1.4.0                   BiocManager_1.30.10           compiler_4.0.0               
 [16] httr_1.4.1                    dqrng_0.2.1                   backports_1.1.5              
 [19] assertthat_0.2.1              Matrix_1.2-18                 fastmap_1.0.1                
 [22] lazyeval_0.2.2                limma_3.43.0                  later_1.0.0                  
 [25] htmltools_0.4.0               prettyunits_1.0.2             tools_4.0.0                  
 [28] igraph_1.2.4.2                rsvd_1.0.2                    gtable_0.3.0                 
 [31] glue_1.3.1                    GenomeInfoDbData_1.2.2        dplyr_0.8.3                  
 [34] rappdirs_0.3.1                Rcpp_1.0.3                    vctrs_0.2.1                  
 [37] Biostrings_2.55.4             ExperimentHub_1.13.5          rtracklayer_1.47.0           
 [40] DelayedMatrixStats_1.9.0      xfun_0.11                     stringr_1.4.0                
 [43] ps_1.3.0                      mime_0.8                      lifecycle_0.1.0              
 [46] irlba_2.3.3                   statmod_1.4.32                XML_3.98-1.20                
 [49] edgeR_3.29.0                  zlibbioc_1.33.0               scales_1.1.0                 
 [52] hms_0.5.2                     promises_1.1.0                ProtGenerics_1.19.3          
 [55] RColorBrewer_1.1-2            yaml_2.2.0                    curl_4.3                     
 [58] memoise_1.1.0                 gridExtra_2.3                 biomaRt_2.43.0               
 [61] stringi_1.4.3                 RSQLite_2.2.0                 highr_0.8                    
 [64] BiocVersion_3.11.1            rlang_0.4.2                   pkgconfig_2.0.3              
 [67] bitops_1.0-6                  evaluate_0.14                 lattice_0.20-38              
 [70] purrr_0.3.3                   labeling_0.3                  GenomicAlignments_1.23.1     
 [73] cowplot_1.0.0                 bit_1.1-14                    processx_3.4.1               
 [76] tidyselect_0.2.5              magrittr_1.5                  bookdown_0.16                
 [79] R6_2.4.1                      DBI_1.1.0                     pillar_1.4.3                 
 [82] withr_2.1.2                   RCurl_1.95-4.12               tibble_2.1.3                 
 [85] crayon_1.3.4                  rmarkdown_2.0                 viridis_0.5.1                
 [88] progress_1.2.2                locfit_1.5-9.1                grid_4.0.0                   
 [91] blob_1.2.0                    callr_3.4.0                   digest_0.6.23                
 [94] xtable_1.8-4                  httpuv_1.5.2                  openssl_1.4.1                
 [97] munsell_0.5.0                 beeswarm_0.2.3                viridisLite_0.3.0            
[100] vipor_0.4.5                   askpass_1.1                  

Bibliography

Segerstolpe, A., A. Palasantza, P. Eliasson, E. M. Andersson, A. C. Andreasson, X. Sun, S. Picelli, et al. 2016. “Single-Cell Transcriptome Profiling of Human Pancreatic Islets in Health and Type 2 Diabetes.” Cell Metab. 24 (4):593–607.