> References > Vignettes > SimBPDD

SimBPDD

Simulating datasets using SimBPDD

Here SimBPDD method will be demonstrated clearly and hope that this document can help you.

library(simmethods)
library(SingleCellExperiment)
# Load data
ref_data <- simmethods::data

# SimBPDD takes a long time to simulate datasets, so we subset the reference data
ref_data <- ref_data[1:100, ]

Simulating datasets using SimBPDD

There is no individual estimation step using SimBPDD as the estimation is combined with simulation step.

Datasets with default parameters
Determin the number of cells
Simulate two groups of cells

Datasets with default parameters

The reference data contains 160 cells and 4000 genes, if we simulate datasets with default parameters and then we will obtain a new data which has the same size as the reference data.

simulate_result <- simmethods::SimBPDD_simulation(
  ref_data = ref_data,
  return_format = "list",
  seed = 111
)
# nCells: 160
# nGenes: 100
# nGroups: 2

Check the dimension of the simulated data:

count_data <- simulate_result$simulate_result$count_data
dim(count_data)
# [1]  95 160

Check the group labels of the simulated cells:

col_data <- simulate_result$simulate_result$col_meta
table(col_data$group)
# 
# Group1 Group2 
#     80     80

Determin the number of cells

simulate_result <- simmethods::SimBPDD_simulation(
  ref_data = ref_data,
  other_prior = list(nCells = 100),
  return_format = "list",
  seed = 111
)
# nCells: 100
# nGenes: 100
# nGroups: 2

Check the dimension of the simulated data:

count_data <- simulate_result$simulate_result$count_data
dim(count_data)
# [1]  95 100

The number of simulated genes is not equal to the original one, as the genes with zero counts across all cells are removed.

Simulate two groups of cells

In SimBPDD, we can directly set other_prior = list(prob.group = c(0.4, 0.6)) to assign two proportions of cell groups.

simulate_result <- simmethods::SimBPDD_simulation(
  ref_data = ref_data,
  other_prior = list(nCells = 100,
                     prob.group = c(0.4, 0.6)),
  return_format = "list",
  seed = 111
)
# nCells: 100
# nGenes: 100
# nGroups: 2

Check cell groups:

table(simulate_result$simulate_result$col_meta$group)
# 
# Group1 Group2 
#     40     60

SimBPDD can only simulate two cell groups.