Multi-scale analysis using SIGNAL
Yang Zhou
2024-06-06
Source:vignettes/Multiscale_analysis.Rmd
Multiscale_analysis.Rmd
Introduction
We can use SIGNAL to perform multi-scale analysis of single-cell data to identify cell subtypes specific to tissues, conditions, and developmental stages. In this vignette, we demonstrate on a recently published developing human immune cell atlas how SIGNAL can be used to integrate in a way that preserves differences between tissues and developmental stages.
Load data matrix and metadata
We perform data integration on the normalized scRNA-seq gene expression matrix. The HVGs provided by the authors are used.
X = readRDS("/home/server/zy/group_scripts/datasets_preparation/Developing_immune/X.rds")
meta = readRDS("/home/server/zy/group_scripts/datasets_preparation/Developing_immune/meta.rds")
str(X)
## Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
## ..@ i : int [1:201165610] 1 9 10 23 37 44 49 50 52 54 ...
## ..@ p : int [1:884584] 0 491 904 1321 1864 2260 2709 3148 3657 4110 ...
## ..@ Dim : int [1:2] 3765 884583
## ..@ Dimnames:List of 2
## .. ..$ : chr [1:3765] "TNMD" "FGR" "CFH" "CFTR" ...
## .. ..$ : chr [1:884583] "FCAImmP7579224-ATTACTCTCGATGAGG" "FCAImmP7579224-CAGCCGAGTACATCCA" "FCAImmP7579224-TGCTACCTCATGTAGC" "FCAImmP7579224-ACGGCCACAAGCTGAG" ...
## ..@ x : num [1:201165610] 2.211 0.965 0.965 2.015 1.447 ...
## ..@ factors : list()
str(meta)
## 'data.frame': 884583 obs. of 4 variables:
## $ Batch : chr "F45" "F45" "F45" "F45" ...
## $ Stage : Factor w/ 11 levels "4 PCW","7 PCW",..: 7 7 7 7 7 7 7 7 7 7 ...
## $ Tissue : Factor w/ 9 levels "Bone Marrow",..: 6 6 6 6 6 6 6 6 6 6 ...
## $ CellType: Factor w/ 13 levels "B cells","Endothelium",..: 8 4 8 8 1 8 8 8 8 8 ...
Visualization of raw data
Colors = distinctColorPalette(13)
pca_res = irlba(t(X), nv = 50)
raw_emb = as.matrix(pca_res$u %*% diag(pca_res$d))
raw_umap = as.data.frame(umap(raw_emb))
colnames(raw_umap) = c("UMAP1", "UMAP2")
raw_umap = cbind.data.frame(meta, raw_umap)
p1 = ggscatter(raw_umap, x = "UMAP1", y = "UMAP2", size = 0.1, color = "CellType", palette = Colors, legend = "right") +
guides(colour = guide_legend(override.aes = list(size = 2)))
p2 = ggscatter(raw_umap, x = "UMAP1", y = "UMAP2", size = 0.1, color = "Tissue", palette = Colors, legend = "right") +
guides(colour = guide_legend(override.aes = list(size = 2)))
p3 = ggscatter(raw_umap, x = "UMAP1", y = "UMAP2", size = 0.1, color = "Stage", palette = Colors, legend = "right") +
guides(colour = guide_legend(override.aes = list(size = 2)))
plot_grid(p1, p2, p3, align = 'h', axis = "b", nrow = 1)
SIGNAL integration
## Run gcPCA!
## gcPCA done!
signal_umap = as.data.frame(umap(t(signal_emb)))
colnames(signal_umap) = c("UMAP1", "UMAP2")
signal_umap = cbind.data.frame(meta, signal_umap)
q1 = ggscatter(signal_umap, x = "UMAP1", y = "UMAP2", size = 0.1, color = "CellType", palette = Colors, legend = "right") +
guides(colour = guide_legend(override.aes = list(size = 2)))
q2 = ggscatter(signal_umap, x = "UMAP1", y = "UMAP2", size = 0.1, color = "Tissue", palette = Colors, legend = "right") +
guides(colour = guide_legend(override.aes = list(size = 2)))
q3 = ggscatter(signal_umap, x = "UMAP1", y = "UMAP2", size = 0.1, color = "Stage", palette = Colors, legend = "right") +
guides(colour = guide_legend(override.aes = list(size = 2)))
plot_grid(q1, q2, q3, align = 'h', axis = "b", nrow = 1)
Session Info
## R version 4.2.3 (2023-03-15)
## Platform: x86_64-conda-linux-gnu (64-bit)
## Running under: Ubuntu 22.10
##
## Matrix products: default
## BLAS/LAPACK: /home/server/anaconda3/envs/zy/lib/libopenblasp-r0.3.21.so
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] cowplot_1.1.1 randomcoloR_1.1.0.1 ggpubr_0.6.0
## [4] ggplot2_3.4.4 uwot_0.2.2 irlba_2.3.5.1
## [7] Matrix_1.5-4.1 SIGNAL_1.0.0
##
## loaded via a namespace (and not attached):
## [1] matrixStats_1.0.0 fs_1.6.4 flock_0.7
## [4] RcppAnnoy_0.0.22 doParallel_1.0.17 tools_4.2.3
## [7] backports_1.4.1 bslib_0.7.0 utf8_1.2.4
## [10] R6_2.5.1 BiocGenerics_0.44.0 colorspace_2.1-0
## [13] withr_3.0.0 tidyselect_1.2.1 bit_4.0.5
## [16] curl_5.2.1 compiler_4.2.3 bigparallelr_0.3.2
## [19] textshaping_0.3.7 cli_3.6.2 BiocNeighbors_1.16.0
## [22] desc_1.4.3 labeling_0.4.3 sass_0.4.9
## [25] scales_1.3.0 pkgdown_2.0.7 systemfonts_1.0.6
## [28] stringr_1.5.1 digest_0.6.35 rmarkdown_2.26
## [31] pkgconfig_2.0.3 htmltools_0.5.8.1 sparseMatrixStats_1.10.0
## [34] MatrixGenerics_1.10.0 fastmap_1.1.1 highr_0.10
## [37] rlang_1.1.3 rstudioapi_0.15.0 jquerylib_0.1.4
## [40] generics_0.1.3 farver_2.1.1 jsonlite_1.8.8
## [43] mclust_6.0.0 BiocParallel_1.32.6 dplyr_1.1.4
## [46] car_3.1-2 magrittr_2.0.3 Rcpp_1.0.12
## [49] munsell_0.5.1 S4Vectors_0.36.2 fansi_1.0.6
## [52] abind_1.4-5 lifecycle_1.0.4 stringi_1.8.3
## [55] yaml_2.3.8 carData_3.0-5 Rtsne_0.17
## [58] grid_4.2.3 parallel_4.2.3 lattice_0.21-8
## [61] knitr_1.46 ps_1.7.6 pillar_1.9.0
## [64] ggsignif_0.6.4 bigstatsr_1.5.12 codetools_0.2-19
## [67] stats4_4.2.3 bigassertr_0.1.6 glue_1.7.0
## [70] evaluate_0.23 V8_4.4.2 vctrs_0.6.5
## [73] foreach_1.5.2 gtable_0.3.5 purrr_1.0.2
## [76] tidyr_1.3.1 cachem_1.0.8 xfun_0.43
## [79] broom_1.0.5 RcppEigen_0.3.4.0.0 ff_4.0.12
## [82] RSpectra_0.16-1 rstatix_0.7.2 ragg_1.2.7
## [85] tibble_3.2.1 iterators_1.0.14 memoise_2.0.1
## [88] cluster_2.1.4 rmio_0.4.0