scDesign2

Here scDesign2 method will be demonstrated clearly and hope that this document can help you.

Estimating parameters from a real dataset

Before simulating datasets, it is important to estimate some essential parameters from a real dataset in order to make the simulated data more real. If you do not have a single-cell transcriptomics count matrix now, you can use the data collected in simmethods package by simmethods:data command.

library(simmethods)
library(SingleCellExperiment)
# Load data
ref_data <- simmethods::data

Default estimation

estimate_result <- simmethods::scDesign2_estimation(
  ref_data = ref_data,
  verbose = TRUE,
  seed = 111
)
# Estimating parameters using scDesign2

Information of cell groups

If the information of cell groups is available, you can use another way to estimate the parameters.

## cell groups
group_condition <- as.numeric(simmethods::group_condition)
estimate_result <- simmethods::scDesign2_estimation(
  ref_data = ref_data,
  other_prior = list(group.condition = group_condition),
  verbose = TRUE,
  seed = 111
)
# Estimating parameters using scDesign2

Information of cell types

You can input information of cell types via cell_type_sel parameter described in scDesign2::fit_model_scDesign2 function

estimate_result <- simmethods::scDesign2_estimation(
  ref_data = ref_data,
  other_prior = list(cell_type_sel = paste0("cell_type", group_condition)),
  verbose = TRUE,
  seed = 111
)
# Estimating parameters using scDesign2

Simulating datasets using scDesign2

After estimating parameter from a real dataset, we will simulate a dataset based on the learned parameters with different scenarios.

  1. Datasets with default parameters
  2. Determin the number of cells
  3. Simulate two or more groups

Datasets with default parameters

The reference data contains 160 cells and 4000 genes, if we simulate datasets with default parameters and then we will obtain a new data which has the same size as the reference data.

simulate_result <- simmethods::scDesign2_simulation(
  parameters = estimate_result[["estimate_result"]],
  return_format = "SCE",
  seed = 111
)
# nCells: 160
# nGenes: 4000
# nGroups: 2

We will get two or groups if information of cell groups or cell type is used in estimation step.

SCE_result <- simulate_result[["simulate_result"]]
dim(SCE_result)
# [1] 4000  160
table(colData(SCE_result)$group)
# 
# Group1 Group2 
#     80     80

Determin the number of cells

We can only set the cell number in scDesign2.

Here, we simulate a new dataset with 500 cells:

simulate_result <- simmethods::scDesign2_simulation(
  parameters = estimate_result[["estimate_result"]],
  return_format = "list",
  other_prior = list(nCells = 500),
  seed = 111
)
# nCells: 500
# nGenes: 4000
# nGroups: 2
result <- simulate_result[["simulate_result"]][["count_data"]]
dim(result)
# [1] 4000  500

Simulate two or more groups

In scDesign2, we can not set nGroups directly and should set prob.group instead. For example, if we want to simulate 2 groups, we can type other_prior = list(prob.group = c(0.5, 0.5)). Note that the sum of prob.group numeric vector must equal to 1, so we can also set prob.group = c(0.3, 0.7).

In addtion, if we want to simulate three or more groups, we should obey the rules:

  • The length of prob.group vector must always equal to the number of cell groups or cell types used in estimation step.
  • The sum of prob.group numeric vector must equal to 1.

For demonstration, we can only simulate two groups using the learned parameters.

simulate_result <- simmethods::scDesign2_simulation(
  parameters = estimate_result[["estimate_result"]],
  return_format = "list",
  other_prior = list(nCells = 500,
                     prob.group = c(0.4, 0.6)),
  seed = 111
)
# nCells: 500
# nGenes: 4000
# nGroups: 2

If you did not input information of cell groups or cell types in the estimation step, you can not simulate groups.

result <- simulate_result[["simulate_result"]][["count_data"]]
dim(result)
# [1] 4000  500
## cell information
cell_info <- simulate_result[["simulate_result"]][["col_meta"]]
table(cell_info$group)
# 
# Group1 Group2 
#    200    300