scDesign

Here scDesign method will be demonstrated clearly and hope that this document can help you.

Simulating datasets using scDesign

There is no estimation step when using scDesign, so we can directly simulate new datasets through reference data.

library(simmethods)
library(SingleCellExperiment)
# Load data
ref_data <- simmethods::data

We will simulate a dataset based on refernece data with different scenarios.

  1. Datasets with default parameters
  2. Determin the number of cells and genes
  3. Simulate two or more groups

Datasets with default parameters

The reference data contains 160 cells and 4000 genes, if we simulate datasets with default parameters and then we will obtain a new data which has the same size as the reference data. In addtion, the simulated dataset will have one group of cells.

simulate_result <- simmethods::scDesign_simulation(
  ref_data = ref_data,
  return_format = "SCE",
  seed = 111
)
# nCells: 160
# nGenes: 4000
# nGroups: 1
# de.prob: 0.1
# fc.group: up--5
# fc.group: down--1.5
# [1] "estimate expression parameters"
SCE_result <- simulate_result[["simulate_result"]]
dim(SCE_result)
# [1] 4000  160
head(colData(SCE_result))
# DataFrame with 6 rows and 1 column
#         cell_name
#       <character>
# Cell1       Cell1
# Cell2       Cell2
# Cell3       Cell3
# Cell4       Cell4
# Cell5       Cell5
# Cell6       Cell6

Determin the number of cells

We can only set the cell number in scDesign.

Here, we simulate a new dataset with 500 cells:

simulate_result <- simmethods::scDesign_simulation(
  ref_data = ref_data,
  return_format = "list",
  other_prior = list(nCells = 500),
  seed = 111
)
# nCells: 500
# nGenes: 4000
# nGroups: 1
# de.prob: 0.1
# fc.group: up--5
# fc.group: down--1.5
# [1] "estimate expression parameters"
result <- simulate_result[["simulate_result"]][["count_data"]]
dim(result)
# [1] 4000  500

Simulate two or more groups

In scDesign, we can et nGroups directly, together with the proportions of different cell groups by prob.group. Moreover, the proportion of DEGs via de.prob and fold change via fc.group can be customed.

For demonstration, we will simulate three groups.

simulate_result <- simmethods::scDesign_simulation(
  ref_data = ref_data,
  return_format = "list",
  other_prior = list(nCells = 500,
                     nGroups = 3,
                     prob.group = c(0.1, 0.3, 0.6),
                     de.prob = 0.2,
                     fc.group = 4),
  seed = 111
)
# nCells: 500
# nGenes: 4000
# nGroups: 3
# de.prob: 0.2
# fc.group: up--4
# fc.group: down--4
# [1] "estimate expression parameters"
result <- simulate_result[["simulate_result"]][["count_data"]]
dim(result)
# [1] 4000  500
## cell information
cell_info <- simulate_result[["simulate_result"]][["col_meta"]]
table(cell_info$group)
# 
# Group1 Group2 Group3 
#     50    150    300
## gene information
gene_info <- simulate_result[["simulate_result"]][["row_meta"]]
### the proportion of DEGs
table(gene_info$de_gene)[2]/nrow(result) ## de.prob = 0.2
# yes 
# 0.2