Here scDesign3 method will be demonstrated clearly and hope that this document can help you.
Before simulating datasets, it is important to estimate some essential parameters from a real dataset in order to make the simulated data more real. If you do not have a single-cell transcriptomics count matrix now, you can use the data collected in simmethods package by simmethods:data
command.
library(simmethods)
library(SingleCellExperiment)
# Load data
ref_data <- simmethods::data
estimate_result <- simmethods::scDesign3_estimation(
ref_data = ref_data,
verbose = TRUE,
seed = 111
)
# Estimating parameters using scDesign3
# Convert Residuals to Multivariate Gaussian
# Converting End
# Copula group A starts
If the information of cell groups is available, you can use another way to estimate the parameters.
## cell groups
group_condition <- as.numeric(simmethods::group_condition)
estimate_result <- simmethods::scDesign3_estimation(
ref_data = ref_data,
other_prior = list(group.condition = group_condition),
verbose = TRUE,
seed = 111
)
# Estimating parameters using scDesign3
# Convert Residuals to Multivariate Gaussian
# Converting End
# Copula group Group1 starts
# Copula group Group2 starts
After estimating parameter from a real dataset, we will simulate a dataset based on the learned parameters with different scenarios.
The reference data contains 160 cells and 4000 genes, if we simulate datasets with default parameters and then we will obtain a new data which has the same size as the reference data.
simulate_result <- simmethods::scDesign3_simulation(
parameters = estimate_result[["estimate_result"]],
return_format = "SCE",
seed = 111
)
# nCells: 160
# nGenes: 4000
# nGroups: 2
# nBatches: 1
# Use Copula to sample a multivariate quantile matrix
# Sample Copula group Group1 starts
# Sample Copula group Group2 starts
We will get two or groups if information of cell groups is used in estimation step.
SCE_result <- simulate_result[["simulate_result"]]
dim(SCE_result)
# [1] 4000 160
table(colData(SCE_result)$group)
#
# Group1 Group2
# 80 80
We can not set the cell or gene number in scDesign3 unless a data frame which contains covariates of targeted simulated data from construct_data function is provided through new_covariate parameter.
In scDesign3, we can not set nGroups
directly and should specify the group labels of cells in the estimation step in scDesign3_estimation
function.
We randomly assign four group labels to cells.
set.seed(666)
estimate_result <- simmethods::scDesign3_estimation(
ref_data = ref_data,
other_prior = list(group.condition = sample(1:4, ncol(ref_data), replace = TRUE)),
verbose = TRUE,
seed = 111
)
# Estimating parameters using scDesign3
# Convert Residuals to Multivariate Gaussian
# Converting End
# Copula group Group2 starts
# Copula group Group4 starts
# Copula group Group3 starts
# Copula group Group1 starts
simulate_result <- simmethods::scDesign3_simulation(
parameters = estimate_result[["estimate_result"]],
return_format = "SCE",
seed = 111
)
# nCells: 160
# nGenes: 4000
# nGroups: 4
# nBatches: 1
# Use Copula to sample a multivariate quantile matrix
# Sample Copula group Group2 starts
# Sample Copula group Group4 starts
# Sample Copula group Group3 starts
# Sample Copula group Group1 starts
Check cell groups
SCE_result <- simulate_result[["simulate_result"]]
table(colData(SCE_result)$group)
#
# Group1 Group2 Group3 Group4
# 29 43 43 45
In scDesign3, we can not set nBatches
directly and should the batch labels of cells in the estimation step in scDesign3_estimation
function.
If you custom the simulated cell number which is not equal to that of the real data, the batch information for simulated cells is not returned.
Before modeling from the reference data, we should construct dynwrap::expression
data using dynwrap::wrap_expression
function and perform trajectory inference using Slingshot using tislingshot::ti_slingshot
or other methods.
if(!requireNamespace("tislingshot", quietly = TRUE)){
devtools::install_github("dynverse/ti_slingshot/package/")
}
dyn_data <- dynwrap::wrap_expression(
counts = t(ref_data),
expression = log2(t(ref_data) + 1)
)
dyn_data <- dynwrap::add_grouping(dyn_data, group_condition)
dyn_model <- dynwrap::infer_trajectory(dyn_data, tislingshot::ti_slingshot())
# Using full covariance matrix
The traj
parameter must be TURE
and dynwrap_data
is needed when you want to estimate parameters from trajectory data.
estimate_result <- simmethods::scDesign3_estimation(
ref_data = ref_data,
other_prior = list(traj = TRUE,
group.condition = group_condition,
dynwrap_data = dyn_model),
verbose = TRUE,
seed = 111
)
# Constructing lineages for the data...
# Estimating parameters using scDesign3
# Convert Residuals to Multivariate Gaussian
# Converting End
# Copula group Group1 starts
# Copula group Group2 starts
Next, we can simulate trajectory data using scDesign3.
simulate_result <- simmethods::scDesign3_simulation(
parameters = estimate_result[["estimate_result"]],
return_format = "list",
seed = 111
)
# nCells: 160
# nGenes: 4000
# nGroups: 2
# nBatches: 1
# Use Copula to sample a multivariate quantile matrix
# Sample Copula group Group1 starts
# Sample Copula group Group2 starts
Before visualization, make sure that you have already installed several R packages:
if(!requireNamespace("dyndimred", quietly = TRUE)){install.packages("dyndimred")}
if(!requireNamespace("dynplot", quietly = TRUE)){install.packages("dynplot")}
First we should wrap the data into a standard object:
dyn_object <- dynwrap::wrap_expression(counts = t(simulate_result$simulate_result$count_data),
expression = log2(t(simulate_result$simulate_result$count_data) + 1))
Next, we infer the trajectory using SlingShot which has been proved to be the most best method to do this:
model <- dynwrap::infer_trajectory(dataset = dyn_object,
method = tislingshot::ti_slingshot(),
parameters = NULL,
give_priors = NULL,
seed = 666,
verbose = TRUE)
# Executing 'slingshot' on '20230911_114342__data_wrapper__3AFJQcaDVH'
# With parameters: list(cluster_method = "pam", ndim = 20L, shrink = 1L, reweight = TRUE, reassign = TRUE, thresh = 0.001, maxit = 10L, stretch = 2L, smoother = "smooth.spline", shrink.method = "cosine")
# inputs: expression
# priors :
# Using full covariance matrix
Finally, we can plot the trajectory after performing dimensionality reduction:
dimred <- dyndimred::dimred_umap(dyn_object$expression)
dynplot::plot_dimred(model, dimred = dimred)
# Coloring by milestone
# Using milestone_percentages from trajectory
For more details about trajectory inference and visualization, please check dynverse.
scDesign3 can also simulate spatial transcriptome data. Besides the gene expression profile, users should also provide the spatial coordinates of each cell (spot). The reference data can be downloaded here.
# Load data (downloaded from https://zenodo.org/record/8251596/files/data118_spatial_OV.rds?download=1)
data <- readRDS("../../../../preprocessed_data/data118_spatial_OV.rds")
ref_data <- t(as.matrix(data$data$counts))
In addition, we can set the spatial coordinates by spatial.x
and spatial.y
parameters.
other_prior <- list(spatial.x = data$data_info$spatial_coordinate$x,
spatial.y = data$data_info$spatial_coordinate$y,
group.condition = as.numeric(as.factor(data$data_info$cluster_info)))
Execute the parameter estimation:
estimate_result <- simmethods::scDesign3_estimation(
ref_data = ref_data,
other_prior = other_prior,
verbose = T,
seed = 10
)
# Estimating parameters using scDesign3
# Convert Residuals to Multivariate Gaussian
# Converting End
# Copula group Group2 starts
# Copula group Group1 starts
Simulate spatial transcriptome data:
simulate_result <- simmethods::scDesign3_simulation(
parameters = estimate_result[["estimate_result"]],
return_format = "list",
seed = 111
)
# nCells: 3492
# nGenes: 1056
# nGroups: 2
# nBatches: 1
# Use Copula to sample a multivariate quantile matrix
# Sample Copula group Group2 starts
# Sample Copula group Group1 starts
Notably, the simulated cells can match the real ones, so that the spatial coordinates are the same as the previously input ones.
Visualize the spatial spots:
library(ggplot2)
location <- simulate_result$simulate_result$col_meta
p <- ggplot(location, aes(x = x, y = y))+
geom_point(aes(color = group))+
theme(panel.grid = element_blank(),
axis.title = element_blank(),
axis.text = element_blank(),
legend.position = "bottom")
p