Mouse embryo scNMT-seq data

Here we use a multi-omics scNMT-seq dataset to demonstrate the simple usage of MultiSpace. The scNMT-seq data was derived from the study of early mouse embryo (Argelaguet et al., Nature, 2019). From E4.5 to E7.5 are used in this example. The spatial data was collected from E8.5 mouse embryo (Lohoff et al., Nature, 2022).

Using MultiSpace to analysis

Take E8.5 scNMT-seq data for example. Users should better separate data in absolute folder if they have more than one stage.

Step 1 Run MultiSpace Pipelineinit to initialize snakemake

The first step of running MultiSpace pipeline is to initialize pipeline config file (also initiation sample in same file)and a working directory. All these steps are implemented by :bash:`MultiSpace Pipelineinit` function.

MultiSpace Pipelineinit --species mm10 \
--samplesheet metasheet.csv \
--directory ~/Project/scNMT_WolfReik/ \
--fasta ~/Reference/mm10.fa \
--fasta_fai ~/Reference/mm10.fa.fai \
--lambda_fasta ~/Reference/mm10_lambda.fa \
--star_annotation ~/Reference/UCSC_mm10/mm10.refGene.gtf \
--star_index ~/Reference/UCSC_mm10/

The results of :bash:`MultiSpace Pipelineinit` are shown as below.

File

Description

config.yaml

Initialized config.yaml generated in user input directory.

Snakefile

Snakefile in directory used for running snakemake.

modules/

Snakemake rules stored in subdir modules of directory.

Step 2 Run snakemake to preprocess raw data

After initialize config file and working directory, :bash:`snakemake -j 5` could be used to run snakemake in working directory. Users can customize cores number ,but be careful not to make the number too large in case limited memory.

cd ~/Project/scNMT_WolfReik/
snakemake -j 5

The results of :bash:`snakemake -j 5` are shown as below. DNA methylation(WCG, W=A or T). Chromatin accessibility(GCH, H=A, C or T)

File

Description

GCH.bin_peak.h5

GCH bin by cell marix stored in H5 format.

WCG.bin_peak.h5

WCG bin by cell marix stored in H5 format.

GCH.bin.merge.peak

GCH bin features. Rowname of GCH.bin_peak.h5

WCG.bin.merge.peak

WCG bin features. Rowname of WCG.bin_peak.h5

GCH.chr1_chr6.site_peak.h5 GCH.chr7_chr12.site_peak.h5 GCH.chr13_chrY.site_peak.h5

GCH site by cell matrix separated by chromatin stored in H5 format.

WCG.site_peak.h5

WCG site by cell matrix stored in H5 format.

WCG.uniq.peak

WCG site features. Rowname of WCG.site_peak.h5

GCH.uniq.peak

GCH site features. Rowname of GCH.site_peak.h5(merge chromtain together)

QCtable.txt

Snakemake rules stored in subdir modules of directory.

usecell.txt

Snakemake rules stored in subdir modules of directory.

QCtable.txt

Snakemake rules stored in subdir modules of directory.

Step 3 Run MultiSpace Scorematrix to calculate WCG/GCH score matrix

MultiSpace Scorematrix can calculate genebody or gene promoter methylation ratio using snakemake output file.

MultiSpace Scorematrix --species mm10 --cell_barcode 04.WCG.GCH/usecell.txt \
--file_path 04.WCG.GCH/ --outdir . --matrixtype WCG --region promoter --distance 2000

The results of :bash:`MultiSpace Scorematrix` are gene by cell matrix stored in TXT format.

MultiSpace Scorematrix can calculate gene activity score using RP(regulatory potential) model.

MultiSpace Scorematrix --species mm10 --cell_barcode 04.WCG.GCH/usecell.txt \
--file_path 04.WCG.GCH/ --outdir . --matrixtype GCH --distance 10000

The results of :bash:`MultiSpace Scorematrix` are gene by cell matrix stored in TXT format.

Step 4 Run MultiSpace Mappingcell to map single cell to spatial

MultiSpace Mappingcell can map single cell to spatial location, and get spatially epigenetic signal. Users can take :bash:`snakemake` output single cell gene expression matrix, bin by cell matrix and bin features as input. Additionally, users should offer a spatial gene count matrix and cell type file. The count matrix could be tab-separated plain-text file with genes as rows and spots as columns. The celltype file should be a tab-separated plain-text file without header. The first column should be the cell name, and the second column should be the corresponding celltype labels.

MultiSpace Mappingcell --sc_count_file 05.Spatial/RNA_normalized.txt --sc_celltype_file celltype.txt \
--st_count_file Spatial/seqFISH_scRNA/RNA_st_normalized.txt --spatial_location Spatial/seFISH_scRNA/loc_EM1.txt \
--epi_binfile WCG.bin_peak.h5 --epi_feature WCG.bin.merge.peak --out_dir . --out_prefix WCG

Users can use :bash:`MultiSpace Mappingcell --help` to see help message. The results are showed below.

File

Description

WCG.signal_mat.npz

DNA methylation signal in spatila location. Bin feature by spot matrix stored in .npz format.

WCG.signal_mat_rowname.txt

Rownames of bin feature by spot matrix after filtering. Colnames of bin feature by spot matrix is colnames of st_count_file.

Validate mapping accuracy:

../_images/validate.png

Mapping E7.5 scNMT-seq data to E8.5 spatial location:

../_images/expr_spat.png

MultiSpace output file downstream analysis

Users can use :bash:`snakemake` output file to do downstream analysis.

Single omic clustering

Using Seurat to cluster RNA gene count matrix by stage and celltype.

Mouse embryo gene count matrix cluster by stage(from E4.5 to E7.5)

library(Seurat)
library(ggplot2)
library(patchwork)
library(dplyr)
library(data.table)
library(stringr)
samplemeta = read.table("allsamplemeta.txt",sep = " ", header = T)
RNA_mat <- as.data.frame(read.table("RNA_normalized.txt",header = T,row.names = 1, check.names=FALSE))

scseurat <- CreateSeuratObject(
 counts = RNA_mat,
 project = "RNA",
 assay = "RNA",
 min.cells = 5
)
scseurat@meta.data$type <- "rna"
scseurat@meta.data$sample <- rownames(scseurat@meta.data)
scseurat@meta.data = merge(samplemeta,scseurat@meta.data,on = "sample")
rownames(scseurat@meta.data) = scseurat@meta.data$sample

scseurat <- NormalizeData(scseurat) %>% ScaleData()
scseurat <- SCTransform(scseurat, assay = "RNA",  verbose = FALSE)
scseurat <- RunPCA(scseurat, dims = 1:30)
scseurat <- RunUMAP(scseurat, dims = 1:30)
scseurat <- FindNeighbors(scseurat, dims = 1:30)
scseurat <- FindClusters(scseurat, resolution = 0.5, verbose = FALSE)

DimPlot(scseurat,reduction = "umap",group.by = "stage")
../_images/clusterbystage.png

Mouse embryo gene count matrix cluster by celltype in E7.5.

e75samplemeta = samplemeta[which(samplemeta$stage == "E7.5"),]
e75RNA_mat = RNA_mat[,which(colnames(RNA_mat) %in% e75samplemeta$sample)]

e75scseurat <- CreateSeuratObject(
 counts = e75RNA_mat,
 project = "RNA",
 assay = "RNA",
 min.cells = 3
)
e75scseurat@meta.data$type <- "rna"
e75scseurat@meta.data$orig.ident <- "E7.5"
e75scseurat@meta.data$sample <- rownames(e75scseurat@meta.data)
e75scseurat@meta.data = merge(samplemeta,e75scseurat@meta.data,on = "sample")
rownames(e75scseurat@meta.data) = e75scseurat@meta.data$sample

e75scseurat <- SCTransform(e75scseurat, assay = "RNA",  verbose = FALSE)
e75scseurat <- RunPCA(e75scseurat, dims = 1:30)
e75scseurat <- RunUMAP(e75scseurat, dims = 1:30)
e75scseurat <- FindNeighbors(e75scseurat, dims = 1:30)
e75scseurat <- FindClusters(e75scseurat, resolution = 0.5, verbose = FALSE)

DimPlot(e75scseurat,reduction = "umap",group.by = "celltype")
../_images/clusterbycelltype.png

Using Signac to cluster WCG/GCH bin count matrix by stage(from E4.5 to E7.5). Take WCG bin matrix for example.

library(Signac)
library(Seurat)
library(ggplot2)
library(stringr)
library(reticulate)
library(Matrix)
np <- import("numpy")
scipy <-import("scipy")

feature = read.csv("WCG.bin.merge.peak",header = F)
usecell <- read.table("usecells.txt")
mydata <- h5read("WCG.bin_peak.h5", "Mcsc")
WCG_mat = sparseMatrix(x = as.numeric(mydata$data),j = as.numeric(mydata$indices),p = as.numeric(mydata$indptr),dims = c(4114260,985),index1 = FALSE)
colnames(WCG_mat) = usecell$V1
rownames(WCG_mat) = feature$V1
meta <- read.table("sample_metadata.txt",sep = "\t", header = T)
samplemeta = merge(meta,usecell, by.x = "sample",by.y = "V1")[,c('sample','stage','lineage10x')]
rownames(samplemeta) = samplemeta$sample
samplemeta$celltype = samplemeta$lineage10x

WCG <- CreateSeuratObject(
 counts = WCG_mat,
 assay = "peaks",
 min.cells = 5,
 meta = samplemeta
)
WCG@meta.data$celltype = samplemeta$celltype

WCG <- RunTFIDF(WCG)
WCG <- FindTopFeatures(WCG, min.cutoff = "q90")
WCG <- RunSVD(WCG)
WCG <- RunUMAP(
  object = WCG,
  reduction = 'lsi',
  dims = 2:20
)

DimPlot(object = WCG, label = TRUE, reduction = "umap", group.by = "stage")
../_images/wcgclusterbystage.png ../_images/gchclusterbystage.png

Using Signac to cluster WCG/GCH bin count matrix by celltype in E7.5.

e75samplemeta = samplemeta[which(samplemeta$stage == "E7.5"),]
WCG_mat = WCG_mat[,e75samplemeta$sample]
e75WCG <- CreateSeuratObject(
 counts = WCG_mat,
 assay = "peaks",
 min.cells = 3,
 meta = e75samplemeta
)
e75WCG@meta.data$celltype = e75samplemeta$celltype

e75WCG <- RunTFIDF(e75WCG)
e75WCG <- FindTopFeatures(e75WCG, min.cutoff = "q90")
e75WCG <- RunSVD(e75WCG)
e75WCG <- RunUMAP(
  object = e75WCG,
  reduction = 'lsi',
  dims = 2:20
)

DimPlot(object = e75WCG, label = TRUE, reduction = "umap", group.by = "celltype")
../_images/wcgclusterbycelltype.png ../_images/gchclusterbycelltype.png

Multi omics clustering

library(reticulate)
library(Seurat)
library(Signac)
library(dplyr)
library(ggplot2)
library(rhdf5)
library(Matrix)
np <- import("numpy")
scipy <-import("scipy")
h5py <- import("h5py")

meta <- read.table("sample_metadata.txt",sep = "\t", header = T)
usecell <- read.table("usecells.txt")
samplemeta = merge(meta,usecell, by.x = "sample",by.y = "V1")[,c('sample','stage','lineage10x')]
rownames(samplemeta) = samplemeta$sample
samplemeta$celltype = samplemeta$lineage10x

RNA_mat <- as.data.frame(read.table("RNA_normalized.txt",header = T,row.names = 1, check.names=FALSE))

wcgfeature = read.csv("WCG.bin.merge.peak",header = F)
mydata <- h5read("WCG.bin_peak.h5", "Mcsc")
WCG_mat = sparseMatrix(x = as.numeric(mydata$data),j = as.numeric(mydata$indices),p = as.numeric(mydata$indptr),dims = c(4114260,985),index1 = FALSE)
colnames(WCG_mat) = usecell$V1
rownames(WCG_mat) = wcgfeature$V1

gchfeature = read.csv("GCH.bin.merge.peak",header = F)
mydata <- h5read("GCH.bin_peak.h5", "Mcsc")
GCH_mat = sparseMatrix(x = as.numeric(mydata$data),j = as.numeric(mydata$indices),p = as.numeric(mydata$indptr),dims = c(2645763,985),index1 = FALSE)
colnames(GCH_mat) = usecell$V1
rownames(GCH_mat) = gchfeature$V1

seurat <- CreateSeuratObject(
 counts = RNA_mat,
 project = "RNA",
 assay = "RNA",
 metadata = samplemeta,
 min.cells = 5
)
seurat[['WCG']] <- CreateAssayObject(counts = WCG_mat,min.cells = 10)
seurat[['GCH']] <- CreateAssayObject(counts = GCH_mat,min.cells = 10)
seurat@meta.data <- cbind(seurat@meta.data,samplemeta)

# RNA analysis
DefaultAssay(seurat) <- "RNA"
seurat <- SCTransform(seurat, verbose = FALSE) %>% RunPCA() %>% RunUMAP(dims = 1:50, reduction.name = 'umap.rna', reduction.key = 'rnaUMAP_')

# WCG analysis
# We exclude the first dimension as this is typically correlated with sequencing depth
DefaultAssay(seurat) <- "WCG"
seurat <- RunTFIDF(seurat)
seurat <- FindTopFeatures(seurat, min.cutoff = 'q90')
seurat <- RunSVD(seurat)
seurat <- RunUMAP(seurat, reduction = 'lsi', dims = 2:50, reduction.name = "umap.wcg", reduction.key = "wcgUMAP_")

# GCH analysis
# We exclude the first dimension as this is typically correlated with sequencing depth
DefaultAssay(seurat) <- "GCH"
seurat <- RunTFIDF(seurat)
seurat <- FindTopFeatures(seurat, min.cutoff = 'q90')
seurat <- RunSVD(seurat)
seurat <- RunUMAP(seurat, reduction = 'lsi', dims = 2:50, reduction.name = "umap.gch", reduction.key = "gchUMAP_")


DefaultAssay(seurat) <- "RNA"
seurat <- FindMultiModalNeighbors(seurat, reduction.list = list("pca", "lsi","lsi"), dims.list = list(1:50, 2:50, 2:50))
seurat <- RunUMAP(seurat, nn.name = "weighted.nn", reduction.name = "wnn.umap", reduction.key = "wnnUMAP_")
seurat <- FindClusters(seurat, graph.name = "wsnn", algorithm = 3, verbose = FALSE)

p1 <- DimPlot(seurat, reduction = "umap.rna", group.by = "stage", label = TRUE, label.size = 2.5, pt.size = 2, repel = TRUE) + ggtitle("RNA")
p2 <- DimPlot(seurat, reduction = "umap.wcg", group.by = "stage", label = TRUE, label.size = 2.5, pt.size = 2, repel = TRUE) + ggtitle("WCG")
p3 <- DimPlot(seurat, reduction = "umap.gch", group.by = "stage", label = TRUE, label.size = 2.5, pt.size = 2, repel = TRUE) + ggtitle("GCH")
p4 <- DimPlot(seurat, reduction = "wnn.umap", group.by = "stage", label = TRUE, label.size = 2.5, pt.size = 2, repel = TRUE) + ggtitle("WNN")
p1 + p2 + p3 +p4 & NoLegend() & theme(plot.title = element_text(hjust = 0.5))
../_images/multiclusterbystage.png
E75meta <- samplemeta[samplemeta$stage == "E7.5" & samplemeta$celltype %in% c("Endoderm","Ectoderm","Mesoderm"),]
RNA_mat <- RNA_mat[,E75meta$sample]
seurat <- CreateSeuratObject(
 counts = RNA_mat,
 project = "RNA",
 assay = "RNA",
 metadata = E75meta,
 min.cells = 3
)
seurat@meta.data <- cbind(seurat@meta.data,E75meta)
GCH_mat <- GCH_mat[,E75meta$sample]
seurat[['GCH']] <- CreateAssayObject(counts = GCH_mat,min.cells = 5)
WCG_mat <- WCG_mat[,E75meta$sample]
seurat[['WCG']] <- CreateAssayObject(counts = WCG_mat,min.cells = 5)

# RNA analysis
DefaultAssay(seurat) <- "RNA"
seurat <- SCTransform(seurat, verbose = FALSE) %>% RunPCA() %>% RunUMAP(dims = 1:50, reduction.name = 'umap.rna', reduction.key = 'rnaUMAP_')
#Note that this single command replaces NormalizeData(), ScaleData(), and FindVariableFeatures()
# GCH analysis
# We exclude the first dimension as this is typically correlated with sequencing depth
DefaultAssay(seurat) <- "GCH"
seurat <- RunTFIDF(seurat)
seurat <- FindTopFeatures(seurat, min.cutoff = 'q90')
seurat <- RunSVD(seurat)
seurat <- RunUMAP(seurat, reduction = 'lsi', dims = 2:50, reduction.name = "umap.gch", reduction.key = "gchUMAP_")

# RNA+GCH
DefaultAssay(seurat) <- "RNA"
seurat <- FindMultiModalNeighbors(seurat, reduction.list = list("pca", "lsi"), dims.list = list(1:50, 2:50),knn.range = 170,sd.scale = 1,k.nn = 30)
seurat <- RunUMAP(seurat, nn.name = "weighted.nn", reduction.name = "wnn.umap", reduction.key = "wnnUMAP_")
seurat <- FindClusters(seurat, graph.name = "wsnn", algorithm = 3, verbose = FALSE)

pdf("GCH+RNA.pdf",width = 10,height = 5)
p1 <- DimPlot(seurat, reduction = "umap.rna", group.by = "celltype", label = TRUE, label.size = 2.5, pt.size = 2, repel = TRUE) + ggtitle("RNA")
p2 <- DimPlot(seurat, reduction = "umap.gch", group.by = "celltype", label = TRUE, label.size = 2.5, pt.size = 2, repel = TRUE) + ggtitle("GCH")
p3 <- DimPlot(seurat, reduction = "wnn.umap", group.by = "celltype", label = TRUE, label.size = 2.5, pt.size = 2, repel = TRUE) + ggtitle("WNN")
p1 + p2 +p3  & theme(plot.title = element_text(hjust = 0.5))
dev.off()

# RNA analysis
DefaultAssay(seurat) <- "RNA"
seurat <- SCTransform(seurat, verbose = FALSE) %>% RunPCA() %>% RunUMAP(dims = 1:50, reduction.name = 'umap.rna', reduction.key = 'rnaUMAP_')
# WCG analysis
# We exclude the first dimension as this is typically correlated with sequencing depth
DefaultAssay(seurat) <- "WCG"
seurat <- RunTFIDF(seurat)
seurat <- FindTopFeatures(seurat, min.cutoff = 'q90')
seurat <- RunSVD(seurat)
seurat <- RunUMAP(seurat, reduction = 'lsi', dims = 2:50, reduction.name = "umap.wcg", reduction.key = "wcgUMAP_")

# RNA+WCG
DefaultAssay(seurat) <- "RNA"
seurat <- FindMultiModalNeighbors(seurat, reduction.list = list("pca", "lsi"), dims.list = list(1:50, 2:50),knn.range = 170,sd.scale = 1,k.nn = 30)
seurat <- RunUMAP(seurat, nn.name = "weighted.nn", reduction.name = "wnn.umap", reduction.key = "wnnUMAP_")
seurat <- FindClusters(seurat, graph.name = "wsnn", algorithm = 3, verbose = FALSE)

pdf("WCG+RNA.pdf",width = 10,height = 5)
p1 <- DimPlot(seurat, reduction = "umap.rna", group.by = "celltype", label = TRUE, label.size = 2.5, pt.size = 2, repel = TRUE) + ggtitle("RNA")
p2 <- DimPlot(seurat, reduction = "umap.wcg", group.by = "celltype", label = TRUE, label.size = 2.5, pt.size = 2, repel = TRUE) + ggtitle("WCG")
p3 <- DimPlot(seurat, reduction = "wnn.umap", group.by = "celltype", label = TRUE, label.size = 2.5, pt.size = 2, repel = TRUE) + ggtitle("WNN")
p1 + p2 +p3  & theme(plot.title = element_text(hjust = 0.5))
dev.off()
../_images/multiclusterbycelltype.png

Spatial multi-omics analysis

library(reticulate)
library(data.table)
library(dplyr)
library(ggplot2)
library(Seurat)
library(Matrix)
library(Giotto)
library(Signac)
np <- import("numpy")
scipy <-import("scipy")

temp_dir = getwd()
temp_dir = '~/Temp/'
myinstructions = createGiottoInstructions(save_dir = temp_dir,
                                          save_plot = TRUE,
                                          show_plot = FALSE)

# initialize spatial expression matrix
expr =  scipy$sparse$load_npz('../STRIDE/WCG_methy_mat.npz')
row = read.csv("../STRIDE/WCG_methy_mat_row.txt",header = FALSE)
spa_loc = read.table("Spa_location_Emb1z5.txt",sep = "\t")
rownames(expr) = row$V1
colnames(expr) = rownames(spa_loc)

signac <- CreateAssayObject(counts = expr)
signac <- RunTFIDF(signac)
signac <- FindTopFeatures(signac, min.cutoff = "q90")
rownames(expr) <- rownames(signac@data)
# giotto object
seqfish_mini <- createGiottoObject(raw_exprs = expr,
                                   norm_expr = signac@data,
                                   spatial_locs = spa_loc,
                                   instructions = myinstructions,cores = 3)

## spatial grid
seqfish_mini <- createSpatialGrid(gobject = seqfish_mini,
                             sdimx_stepsize = 0.5,
                             sdimy_stepsize = 0.5,minimum_padding = 0.1)

spatPlot(gobject = seqfish_mini, show_grid = T, point_size = 1.5,
         save_param = c(save_name = '8_a_grid'))


## delaunay network: stats + creation
plotStatDelaunayNetwork(gobject = seqfish_mini, maximum_distance = 400, save_plot = F)
seqfish_mini = createSpatialNetwork(gobject = seqfish_mini, minimum_k = 10, maximum_distance_delaunay = 400)


km_spatialgenes = binSpect(seqfish_mini, bin_method = 'rank', cores = 3)
spatGenePlot(seqfish_mini, expression_values = 'normalized', genes = km_spatialgenes[c(1:4),]$genes,
             point_shape = 'border', point_border_stroke = 0.1,
             show_network = F, network_color = 'lightgrey', point_size = 0.8,
             cow_n_col = 2,
             save_param = list(save_name = '10_a_spatialgenes_km'))


# 1. calculate spatial correlation scores
ext_spatial_genes = km_spatialgenes[1:2500]$genes
spat_cor_netw_DT = detectSpatialCorGenes(seqfish_mini,
                                         method = 'network',
                                         spatial_network_name = 'Delaunay_network',
                                         subset_genes = ext_spatial_genes)

# 2. cluster correlation scores
spat_cor_netw_DT = clusterSpatialCorGenes(spat_cor_netw_DT,
                                          name = 'spat_netw_clus', k = 2)
heatmSpatialCorGenes(seqfish_mini, spatCorObject = spat_cor_netw_DT,
                     use_clus_name = 'spat_netw_clus')
../_images/binspectclustercorrelation.png
netw_ranks = rankSpatialCorGroups(seqfish_mini,
                                  spatCorObject = spat_cor_netw_DT,
                                  use_clus_name = 'spat_netw_clus')

cluster_genes_DT = showSpatialCorGenes(spat_cor_netw_DT,
                                       use_clus_name = 'spat_netw_clus',
                                       show_top_genes = 1)
cluster_genes = cluster_genes_DT$clus; names(cluster_genes) = cluster_genes_DT$gene_ID

seqfish_mini = createMetagenes(seqfish_mini, gene_clusters = cluster_genes, name = 'cluster_metagene')
spatCellPlot(seqfish_mini,
             spat_enr_names = 'cluster_metagene',
             cell_annotation_values = netw_ranks$clusters,
             point_size = 0.7, cow_n_col = 3)
../_images/epi_cluster.png