Skip to contents

Introduction

We can use SIGNAL to perform multi-scale analysis of single-cell data to identify cell subtypes specific to tissues, conditions, and developmental stages. In this vignette, we demonstrate on a recently published developing human immune cell atlas how SIGNAL can be used to integrate in a way that preserves differences between tissues and developmental stages.

Load data matrix and metadata

We perform data integration on the normalized scRNA-seq gene expression matrix. The HVGs provided by the authors are used.

X = readRDS("/home/server/zy/group_scripts/datasets_preparation/Developing_immune/X.rds")
meta = readRDS("/home/server/zy/group_scripts/datasets_preparation/Developing_immune/meta.rds")
str(X)
## Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
##   ..@ i       : int [1:201165610] 1 9 10 23 37 44 49 50 52 54 ...
##   ..@ p       : int [1:884584] 0 491 904 1321 1864 2260 2709 3148 3657 4110 ...
##   ..@ Dim     : int [1:2] 3765 884583
##   ..@ Dimnames:List of 2
##   .. ..$ : chr [1:3765] "TNMD" "FGR" "CFH" "CFTR" ...
##   .. ..$ : chr [1:884583] "FCAImmP7579224-ATTACTCTCGATGAGG" "FCAImmP7579224-CAGCCGAGTACATCCA" "FCAImmP7579224-TGCTACCTCATGTAGC" "FCAImmP7579224-ACGGCCACAAGCTGAG" ...
##   ..@ x       : num [1:201165610] 2.211 0.965 0.965 2.015 1.447 ...
##   ..@ factors : list()
str(meta)
## 'data.frame':    884583 obs. of  4 variables:
##  $ Batch   : chr  "F45" "F45" "F45" "F45" ...
##  $ Stage   : Factor w/ 11 levels "4 PCW","7 PCW",..: 7 7 7 7 7 7 7 7 7 7 ...
##  $ Tissue  : Factor w/ 9 levels "Bone Marrow",..: 6 6 6 6 6 6 6 6 6 6 ...
##  $ CellType: Factor w/ 13 levels "B cells","Endothelium",..: 8 4 8 8 1 8 8 8 8 8 ...

Visualization of raw data

Colors = distinctColorPalette(13)
pca_res = irlba(t(X), nv = 50)
raw_emb = as.matrix(pca_res$u %*% diag(pca_res$d))
raw_umap = as.data.frame(umap(raw_emb))
colnames(raw_umap) = c("UMAP1", "UMAP2")
raw_umap = cbind.data.frame(meta, raw_umap)
p1 = ggscatter(raw_umap, x = "UMAP1", y = "UMAP2", size = 0.1, color = "CellType", palette = Colors, legend = "right") + 
  guides(colour = guide_legend(override.aes = list(size = 2)))
p2 = ggscatter(raw_umap, x = "UMAP1", y = "UMAP2", size = 0.1, color = "Tissue", palette = Colors, legend = "right") + 
  guides(colour = guide_legend(override.aes = list(size = 2)))
p3 = ggscatter(raw_umap, x = "UMAP1", y = "UMAP2", size = 0.1, color = "Stage", palette = Colors, legend = "right") + 
  guides(colour = guide_legend(override.aes = list(size = 2)))
plot_grid(p1, p2, p3, align = 'h', axis = "b", nrow = 1)

SIGNAL integration

signal_emb = Run.gcPCA(X, meta, g_factor = c("Tissue", "Stage"), b_factor = "Batch")
## Run gcPCA!
## gcPCA done!
signal_umap = as.data.frame(umap(t(signal_emb)))
colnames(signal_umap) = c("UMAP1", "UMAP2")
signal_umap = cbind.data.frame(meta, signal_umap)
q1 = ggscatter(signal_umap, x = "UMAP1", y = "UMAP2", size = 0.1, color = "CellType", palette = Colors, legend = "right") + 
  guides(colour = guide_legend(override.aes = list(size = 2)))
q2 = ggscatter(signal_umap, x = "UMAP1", y = "UMAP2", size = 0.1, color = "Tissue", palette = Colors, legend = "right") + 
  guides(colour = guide_legend(override.aes = list(size = 2)))
q3 = ggscatter(signal_umap, x = "UMAP1", y = "UMAP2", size = 0.1, color = "Stage", palette = Colors, legend = "right") + 
  guides(colour = guide_legend(override.aes = list(size = 2)))
plot_grid(q1, q2, q3, align = 'h', axis = "b", nrow = 1)

Session Info
## R version 4.2.3 (2023-03-15)
## Platform: x86_64-conda-linux-gnu (64-bit)
## Running under: Ubuntu 22.10
## 
## Matrix products: default
## BLAS/LAPACK: /home/server/anaconda3/envs/zy/lib/libopenblasp-r0.3.21.so
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] cowplot_1.1.1       randomcoloR_1.1.0.1 ggpubr_0.6.0       
## [4] ggplot2_3.4.4       uwot_0.2.2          irlba_2.3.5.1      
## [7] Matrix_1.5-4.1      SIGNAL_1.0.0       
## 
## loaded via a namespace (and not attached):
##  [1] matrixStats_1.0.0        fs_1.6.4                 flock_0.7               
##  [4] RcppAnnoy_0.0.22         doParallel_1.0.17        tools_4.2.3             
##  [7] backports_1.4.1          bslib_0.7.0              utf8_1.2.4              
## [10] R6_2.5.1                 BiocGenerics_0.44.0      colorspace_2.1-0        
## [13] withr_3.0.0              tidyselect_1.2.1         bit_4.0.5               
## [16] curl_5.2.1               compiler_4.2.3           bigparallelr_0.3.2      
## [19] textshaping_0.3.7        cli_3.6.2                BiocNeighbors_1.16.0    
## [22] desc_1.4.3               labeling_0.4.3           sass_0.4.9              
## [25] scales_1.3.0             pkgdown_2.0.7            systemfonts_1.0.6       
## [28] stringr_1.5.1            digest_0.6.35            rmarkdown_2.26          
## [31] pkgconfig_2.0.3          htmltools_0.5.8.1        sparseMatrixStats_1.10.0
## [34] MatrixGenerics_1.10.0    fastmap_1.1.1            highr_0.10              
## [37] rlang_1.1.3              rstudioapi_0.15.0        jquerylib_0.1.4         
## [40] generics_0.1.3           farver_2.1.1             jsonlite_1.8.8          
## [43] mclust_6.0.0             BiocParallel_1.32.6      dplyr_1.1.4             
## [46] car_3.1-2                magrittr_2.0.3           Rcpp_1.0.12             
## [49] munsell_0.5.1            S4Vectors_0.36.2         fansi_1.0.6             
## [52] abind_1.4-5              lifecycle_1.0.4          stringi_1.8.3           
## [55] yaml_2.3.8               carData_3.0-5            Rtsne_0.17              
## [58] grid_4.2.3               parallel_4.2.3           lattice_0.21-8          
## [61] knitr_1.46               ps_1.7.6                 pillar_1.9.0            
## [64] ggsignif_0.6.4           bigstatsr_1.5.12         codetools_0.2-19        
## [67] stats4_4.2.3             bigassertr_0.1.6         glue_1.7.0              
## [70] evaluate_0.23            V8_4.4.2                 vctrs_0.6.5             
## [73] foreach_1.5.2            gtable_0.3.5             purrr_1.0.2             
## [76] tidyr_1.3.1              cachem_1.0.8             xfun_0.43               
## [79] broom_1.0.5              RcppEigen_0.3.4.0.0      ff_4.0.12               
## [82] RSpectra_0.16-1          rstatix_0.7.2            ragg_1.2.7              
## [85] tibble_3.2.1             iterators_1.0.14         memoise_2.0.1           
## [88] cluster_2.1.4            rmio_0.4.0