We can use SIGNAL to perform multi-scale analysis of single-cell data to identify cell subtypes specific to tissues, conditions, and developmental stages. In this vignette, we demonstrate on a recently published developing human immune cell atlas how SIGNAL can be used to integrate in a way that preserves differences between tissues and developmental stages.

Load data matrix and metadata

We perform data integration on the normalized scRNA-seq gene expression matrix. The HVGs provided by the authors are used.

X = readRDS("/home/server/zy/group_scripts/datasets_preparation/Developing_immune/X.rds")
meta = readRDS("/home/server/zy/group_scripts/datasets_preparation/Developing_immune/meta.rds")
## Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
##   ..@ i       : int [1:201165610] 1 9 10 23 37 44 49 50 52 54 ...
##   ..@ p       : int [1:884584] 0 491 904 1321 1864 2260 2709 3148 3657 4110 ...
##   ..@ Dim     : int [1:2] 3765 884583
##   ..@ Dimnames:List of 2
##   .. ..$ : chr [1:3765] "TNMD" "FGR" "CFH" "CFTR" ...
##   .. ..$ : chr [1:884583] "FCAImmP7579224-ATTACTCTCGATGAGG" "FCAImmP7579224-CAGCCGAGTACATCCA" "FCAImmP7579224-TGCTACCTCATGTAGC" "FCAImmP7579224-ACGGCCACAAGCTGAG" ...
##   ..@ x       : num [1:201165610] 2.211 0.965 0.965 2.015 1.447 ...
##   ..@ factors : list()
## 'data.frame':    884583 obs. of  4 variables:
##  $ Batch   : chr  "F45" "F45" "F45" "F45" ...
##  $ Stage   : Factor w/ 11 levels "4 PCW","7 PCW",..: 7 7 7 7 7 7 7 7 7 7 ...
##  $ Tissue  : Factor w/ 9 levels "Bone Marrow",..: 6 6 6 6 6 6 6 6 6 6 ...
##  $ CellType: Factor w/ 13 levels "B cells","Endothelium",..: 8 4 8 8 1 8 8 8 8 8 ...

Visualization of raw data

Colors = distinctColorPalette(13)
pca_res = irlba(t(X), nv = 50)
raw_emb = as.matrix(pca_res$u %*% diag(pca_res$d))
raw_umap =
colnames(raw_umap) = c("UMAP1", "UMAP2")
raw_umap =, raw_umap)
p1 = ggscatter(raw_umap, x = "UMAP1", y = "UMAP2", size = 0.1, color = "CellType", palette = Colors, legend = "right") + 
  guides(colour = guide_legend(override.aes = list(size = 2)))
p2 = ggscatter(raw_umap, x = "UMAP1", y = "UMAP2", size = 0.1, color = "Tissue", palette = Colors, legend = "right") + 
  guides(colour = guide_legend(override.aes = list(size = 2)))
p3 = ggscatter(raw_umap, x = "UMAP1", y = "UMAP2", size = 0.1, color = "Stage", palette = Colors, legend = "right") + 
  guides(colour = guide_legend(override.aes = list(size = 2)))
plot_grid(p1, p2, p3, align = 'h', axis = "b", nrow = 1)

SIGNAL integration

signal_emb = Run.gcPCA(X, meta, g_factor = c("Tissue", "Stage"), b_factor = "Batch")
## Run gcPCA!
## gcPCA done!
signal_umap =
colnames(signal_umap) = c("UMAP1", "UMAP2")
signal_umap =, signal_umap)
q1 = ggscatter(signal_umap, x = "UMAP1", y = "UMAP2", size = 0.1, color = "CellType", palette = Colors, legend = "right") + 
  guides(colour = guide_legend(override.aes = list(size = 2)))
q2 = ggscatter(signal_umap, x = "UMAP1", y = "UMAP2", size = 0.1, color = "Tissue", palette = Colors, legend = "right") + 
  guides(colour = guide_legend(override.aes = list(size = 2)))
q3 = ggscatter(signal_umap, x = "UMAP1", y = "UMAP2", size = 0.1, color = "Stage", palette = Colors, legend = "right") + 
  guides(colour = guide_legend(override.aes = list(size = 2)))
plot_grid(q1, q2, q3, align = 'h', axis = "b", nrow = 1)

