Here SimBPDD method will be demonstrated clearly and hope that this document can help you.
library(simmethods)
library(SingleCellExperiment)
# Load data
ref_data <- simmethods::data
# SimBPDD takes a long time to simulate datasets, so we subset the reference data
ref_data <- ref_data[1:100, ]
There is no individual estimation step using SimBPDD as the estimation is combined with simulation step.
The reference data contains 160 cells and 4000 genes, if we simulate datasets with default parameters and then we will obtain a new data which has the same size as the reference data.
simulate_result <- simmethods::SimBPDD_simulation(
ref_data = ref_data,
return_format = "list",
seed = 111
)
# nCells: 160
# nGenes: 100
# nGroups: 2
Check the dimension of the simulated data:
count_data <- simulate_result$simulate_result$count_data
dim(count_data)
# [1] 95 160
Check the group labels of the simulated cells:
col_data <- simulate_result$simulate_result$col_meta
table(col_data$group)
#
# Group1 Group2
# 80 80
simulate_result <- simmethods::SimBPDD_simulation(
ref_data = ref_data,
other_prior = list(nCells = 100),
return_format = "list",
seed = 111
)
# nCells: 100
# nGenes: 100
# nGroups: 2
Check the dimension of the simulated data:
count_data <- simulate_result$simulate_result$count_data
dim(count_data)
# [1] 95 100
The number of simulated genes is not equal to the original one, as the genes with zero counts across all cells are removed.
In SimBPDD, we can directly set other_prior = list(prob.group = c(0.4, 0.6))
to assign two proportions of cell groups.
simulate_result <- simmethods::SimBPDD_simulation(
ref_data = ref_data,
other_prior = list(nCells = 100,
prob.group = c(0.4, 0.6)),
return_format = "list",
seed = 111
)
# nCells: 100
# nGenes: 100
# nGroups: 2
Check cell groups:
table(simulate_result$simulate_result$col_meta$group)
#
# Group1 Group2
# 40 60
SimBPDD can only simulate two cell groups.