SRTsim was specifically developed for simulating spatial transcriptome data. Besides the gene expression profile, users should also provide the spatial coordinates of each cell (spot). The reference data can be downloaded here.
Before simulating datasets, it is important to estimate some essential parameters from a real dataset in order to make the simulated data more real.
library(simmethods)
# Load data (downloaded from https://zenodo.org/record/8251596/files/data118_spatial_OV.rds?download=1)
data <- readRDS("../../../../preprocessed_data/data118_spatial_OV.rds")
ref_data <- t(as.matrix(data$data$counts))
In addition, we can set the spatial coordinates by spatial.x
and spatial.y
parameters.
other_prior <- list(spatial.x = data$data_info$spatial_coordinate$x,
spatial.y = data$data_info$spatial_coordinate$y)
Execute the parameter estimation:
estimate_result <- simmethods::SRTsim_estimation(
ref_data = ref_data,
other_prior = other_prior,
verbose = T,
seed = 10
)
# Estimating parameters using SRTsim
Users can also input the group information of cells:
other_prior <- list(spatial.x = data$data_info$spatial_coordinate$x,
spatial.y = data$data_info$spatial_coordinate$y,
group.condition = data$data_info$group_condition)
estimate_result <- simmethods::SRTsim_estimation(
ref_data = ref_data,
other_prior = other_prior,
verbose = T,
seed = 10
)
# Estimating parameters using SRTsim
simulate_result <- simmethods::SRTsim_simulation(
parameters = estimate_result$estimate_result,
other_prior = NULL,
return_format = "SCE",
seed = 111
)
# nSpots: 3492
# nGenes: 1056
# nGroups: 2
SCE_result <- simulate_result[["simulate_result"]]
dim(SCE_result)
# [1] 1056 3492
head(colData(SCE_result))
# DataFrame with 6 rows and 4 columns
# x y group cell_name
# <numeric> <numeric> <character> <character>
# AAACAAGTATCTCCCA-1 27 38 B AAACAAGTATCTCCCA-1
# AAACACCAATAACTGC-1 110 29 B AAACACCAATAACTGC-1
# AAACAGGGTCTATATT-1 116 41 B AAACAGGGTCTATATT-1
# AAACATTTCCCGGATT-1 32 27 A AAACATTTCCCGGATT-1
# AAACCCGAACGAAATC-1 14 43 B AAACCCGAACGAAATC-1
# AAACCGGAAATGTTAA-1 5 34 B AAACCGGAAATGTTAA-1
There is a strict rule for simulating cell groups using SRTsim:
Users can simulate cell groups when the information of cell group labels is used for parameter estimation;
The number of the simulated cell groups must be equal to that of the real groups used in parameter estimation.
As we used the information of cell groups in parameter estimation, so we can simulate the data with cell groups.
simulate_result <- simmethods::SRTsim_simulation(
parameters = estimate_result$estimate_result,
other_prior = NULL,
return_format = "list",
seed = 111
)
# nSpots: 3492
# nGenes: 1056
# nGroups: 2
cell_meta <- simulate_result$simulate_result$col_meta
head(cell_meta)
# x y group cell_name
# AAACAAGTATCTCCCA-1 27 38 B AAACAAGTATCTCCCA-1
# AAACACCAATAACTGC-1 110 29 B AAACACCAATAACTGC-1
# AAACAGGGTCTATATT-1 116 41 B AAACAGGGTCTATATT-1
# AAACATTTCCCGGATT-1 32 27 A AAACATTTCCCGGATT-1
# AAACCCGAACGAAATC-1 14 43 B AAACCCGAACGAAATC-1
# AAACCGGAAATGTTAA-1 5 34 B AAACCGGAAATGTTAA-1
The x
and y
columns represent the spatial positions of cells (spots), and the group
column denotes the group labels of cells.
Check the group labels of cells:
table(cell_meta$group)
#
# A B
# 1051 2441
Visualize the spatial spots:
library(ggplot2)
location <- simulate_result$simulate_result$col_meta
p <- ggplot(location, aes(x = x, y = y))+
geom_point(aes(color = group))+
theme(panel.grid = element_blank(),
axis.title = element_blank(),
axis.text = element_blank(),
legend.position = "bottom")
p