Here scDD method will be demonstrated clearly and hope that this document can help you.
Before simulating datasets, it is important to estimate some essential parameters from a real dataset in order to make the simulated data more real. If you do not have a single-cell transcriptomics count matrix now, you can use the data collected in simmethods package by simmethods:data
command.
When you use scDD to estimate parameters from a real dataset, you must input a numeric vector to specify the groups or plates that each cell comes from, like other_prior = list(group.condition = the numeric vector)
.
library(simmethods)
library(SingleCellExperiment)
# Load data
ref_data <- SingleCellExperiment::counts(scater::mockSCE())
set.seed(111)
group_condition <- sample(c(1, 2), 200, replace = TRUE)
## group_condition can must be a numeric vector.
other_prior <- list(group.condition = as.numeric(group_condition))
Using simmethods::scDD_estimation
command to execute the estimation step.
estimate_result <- simmethods::scDD_estimation(ref_data = ref_data,
other_prior = other_prior,
verbose = T,
seed = 10)
# Estimating parameters using scDD
# Performing Median Normalization
# Setting up parallel back-end using 1 cores
# Clustering observed expression data for each gene
# Notice: Number of permutations is set to zero; using
# Kolmogorov-Smirnov to test for differences in distributions
# instead of the Bayes Factor permutation test
# Classifying significant genes into patterns
Time consuming:
estimate_result$estimate_detection$Elapsed_Time_sec
# [1] 130.466
After estimating parameter from a real dataset, we will simulate a dataset based on the learned parameters with different scenarios.
The reference data contains 200 cells and 2000 genes, if we simulate datasets with default parameters and then we will obtain a new data which has the same size as the reference data.
The simulated dataset will always have two group of cells using scDD.
simulate_result <- simmethods::scDD_simulation(
parameters = estimate_result[["estimate_result"]],
return_format = "SCE",
seed = 111
)
SCE_result <- simulate_result[["simulate_result"]]
dim(SCE_result)
# [1] 2000 200
table(colData(SCE_result)$group)
#
# Group1 Group2
# 100 100
In scDD, users can only set nCells
to specify the number of cells because the genes are already fixed after estimation step.
simulate_result <- simmethods::scDD_simulation(
parameters = estimate_result[["estimate_result"]],
return_format = "list",
other_prior = list(nCells = 1000),
seed = 111
)
result <- simulate_result[["simulate_result"]][["count_data"]]
dim(result)
# [1] 2000 1000
col_data <- simulate_result$simulate_result$col_meta
table(col_data$group)
#
# Group1 Group2
# 500 500