Chapter 34 Human Cell Atlas bone marrow dataset

34.1 Introduction

To make it as easy as possible to get started, here we simply provide a script that walks through a typical, basic scRNA-seq analysis in code, with prose as comments (#), and all visualization held until the end of the script.

Here, we use an example dataset from the Human Cell Atlas immune cell profiling project on bone marrow. This dataset is loaded via the HCAData package, which provides a ready to use SingleCellExperiment object.

Note that the HCAData bone marrow dataset is comprised of 8 donors, so we have added an integration step to ameliorate batch effects caused by different donors. However, for use cases where integration is not necessary (e.g. no expected batch effects), we note in the code what to skip and relevant arguments to replace.

Lastly, note that some arguments are added for the sake of reducing computational runtime and can be modified or removed. These include parallelization via BPPARAM, and different algorithms for SVD and nearest-neighbor via BSPARAM and BNPARAM. See the “Adaptations for Large-scale Data” chapter for more information on these arguments.

34.2 Code

34.3 Visualizations

Session Info

R version 3.6.1 (2019-07-05)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.5 LTS

Matrix products: default
BLAS:   /home/ramezqui/Rbuild/danbuild/R-3.6.1/lib/libRblas.so
LAPACK: /home/ramezqui/Rbuild/danbuild/R-3.6.1/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=C              
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
 [1] RColorBrewer_1.1-2          slingshot_1.4.0            
 [3] princurve_2.1.4             fgsea_1.12.0               
 [5] Rcpp_1.0.2                  dplyr_0.8.3                
 [7] msigdf_5.2                  org.Hs.eg.db_3.10.0        
 [9] AnnotationDbi_1.48.0        igraph_1.2.4.1             
[11] batchelor_1.2.1             scran_1.14.0               
[13] scater_1.14.0               ggplot2_3.2.1              
[15] rhdf5_2.30.0                HCAData_1.1.1              
[17] SingleCellExperiment_1.8.0  SummarizedExperiment_1.16.0
[19] DelayedArray_0.12.0         BiocParallel_1.20.0        
[21] matrixStats_0.55.0          Biobase_2.46.0             
[23] GenomicRanges_1.38.0        GenomeInfoDb_1.22.0        
[25] IRanges_2.20.0              S4Vectors_0.24.0           
[27] BiocGenerics_0.32.0         Cairo_1.5-10               
[29] BiocStyle_2.14.0            OSCAUtils_0.0.1            

loaded via a namespace (and not attached):
 [1] ggbeeswarm_0.6.0              colorspace_1.4-1             
 [3] XVector_0.26.0                BiocNeighbors_1.4.0          
 [5] bit64_0.9-7                   interactiveDisplayBase_1.24.0
 [7] RSpectra_0.15-0               codetools_0.2-16             
 [9] knitr_1.25                    zeallot_0.1.0                
[11] dbplyr_1.4.2                  pheatmap_1.0.12              
[13] uwot_0.1.4                    shiny_1.4.0                  
[15] HDF5Array_1.14.0              BiocManager_1.30.9           
[17] compiler_3.6.1                httr_1.4.1                   
[19] dqrng_0.2.1                   backports_1.1.5              
[21] assertthat_0.2.1              Matrix_1.2-17                
[23] fastmap_1.0.1                 lazyeval_0.2.2               
[25] limma_3.42.0                  later_1.0.0                  
[27] BiocSingular_1.2.0            htmltools_0.4.0              
[29] tools_3.6.1                   rsvd_1.0.2                   
[31] gtable_0.3.0                  glue_1.3.1                   
[33] GenomeInfoDbData_1.2.2        rappdirs_0.3.1               
[35] fastmatch_1.1-0               vctrs_0.2.0                  
[37] nlme_3.1-141                  ape_5.3                      
[39] ExperimentHub_1.12.0          DelayedMatrixStats_1.8.0     
[41] xfun_0.10                     stringr_1.4.0                
[43] mime_0.7                      irlba_2.3.3                  
[45] statmod_1.4.32                AnnotationHub_2.18.0         
[47] edgeR_3.28.0                  zlibbioc_1.32.0              
[49] scales_1.0.0                  promises_1.1.0               
[51] yaml_2.2.0                    curl_4.2                     
[53] memoise_1.1.0                 gridExtra_2.3                
[55] stringi_1.4.3                 RSQLite_2.1.2                
[57] BiocVersion_3.10.1            rlang_0.4.1                  
[59] pkgconfig_2.0.3               bitops_1.0-6                 
[61] evaluate_0.14                 lattice_0.20-38              
[63] purrr_0.3.3                   Rhdf5lib_1.8.0               
[65] patchwork_0.0.1               labeling_0.3                 
[67] cowplot_1.0.0                 bit_1.1-14                   
[69] tidyselect_0.2.5              RcppAnnoy_0.0.13             
[71] magrittr_1.5                  bookdown_0.14                
[73] R6_2.4.0                      DBI_1.0.0                    
[75] pillar_1.4.2                  withr_2.1.2                  
[77] RCurl_1.95-4.12               tibble_2.1.3                 
[79] crayon_1.3.4                  BiocFileCache_1.10.0         
[81] rmarkdown_1.16                viridis_0.5.1                
[83] locfit_1.5-9.1                grid_3.6.1                   
[85] data.table_1.12.6             blob_1.2.0                   
[87] digest_0.6.22                 xtable_1.8-4                 
[89] httpuv_1.5.2                  RcppParallel_4.4.4           
[91] munsell_0.5.0                 beeswarm_0.2.3               
[93] viridisLite_0.3.0             vipor_0.4.5