POWSC

Here POWSC method will be demonstrated clearly and hope that this document can help you.

Estimating parameters from a real dataset

Before simulating datasets, it is important to estimate some essential parameters from a real dataset in order to make the simulated data more real. If you do not have a single-cell transcriptomics count matrix now, you can use the data generated by scater::mockSCE command.

library(simmethods)
library(SingleCellExperiment)
# Load data
ref_data <- counts(scater::mockSCE())
dim(ref_data)
# [1] 2000  200

Using simmethods::POWSC_estimation command to execute the estimation step.

estimate_result <- simmethods::POWSC_estimation(ref_data = ref_data,
                                                verbose = T,
                                                seed = 10)
# Estimating parameters using POWSC

Simulating datasets using POWSC

After estimating parameter from a real dataset, we will simulate a dataset based on the learned parameters with different scenarios.

  1. Datasets with default parameters
  2. Determin the number of cells
  3. Simulate two groups

Datasets with default parameters

The reference data contains 200 cells and 2000 genes, if we simulate datasets with default parameters and then we will obtain a new data which has the same size as the reference data. In addtion, the simulated dataset will have one group of cells.

simulate_result <- simmethods::POWSC_simulation(
  parameters = estimate_result[["estimate_result"]],
  return_format = "SCE",
  seed = 111
)
# nCells: 200
# nGenes: 2000
# nGroups: 2
# de.prob: 0.1
SCE_result <- simulate_result[["simulate_result"]]
dim(SCE_result)
# [1] 2000  200
head(colData(SCE_result))
# DataFrame with 6 rows and 2 columns
#         cell_name       group
#       <character> <character>
# Cell1       Cell1      Group1
# Cell2       Cell2      Group1
# Cell3       Cell3      Group1
# Cell4       Cell4      Group1
# Cell5       Cell5      Group1
# Cell6       Cell6      Group1
head(rowData(SCE_result))
# DataFrame with 6 rows and 2 columns
#         gene_name     de_gene
#       <character> <character>
# Gene1       Gene1          no
# Gene2       Gene2          no
# Gene3       Gene3         yes
# Gene4       Gene4          no
# Gene5       Gene5         yes
# Gene6       Gene6          no

Determin the number of cells

In POWSC, we can set nCells directly. For example, if we want to simulate 500 cells, we can type other_prior = list(nCells = 500). Here, we simulate a new dataset with 500 cells:

simulate_result <- simmethods::POWSC_simulation(
  parameters = estimate_result[["estimate_result"]],
  return_format = "list",
  other_prior = list(nCells = 500),
  seed = 111
)
# nCells: 500
# nGenes: 2000
# nGroups: 2
# de.prob: 0.1
result <- simulate_result[["simulate_result"]][["count_data"]]
dim(result)
# [1] 2000  500

Simulate two groups

POWSC will automatically simulate two cell groups by default. Users can set de.prob to specify the proportion of DEGs between two groups.

simulate_result <- simmethods::POWSC_simulation(
  parameters = estimate_result[["estimate_result"]],
  return_format = "list",
  other_prior = list(nCells = 500,
                     de.prob = 0.2),
  seed = 111
)
# nCells: 500
# nGenes: 2000
# nGroups: 2
# de.prob: 0.2
result <- simulate_result[["simulate_result"]][["count_data"]]
dim(result)
# [1] 2000  500
## cell information
cell_info <- simulate_result[["simulate_result"]][["col_meta"]]
table(cell_info$group)
# 
# Group1 Group2 
#    250    250
## gene information
gene_info <- simulate_result[["simulate_result"]][["row_meta"]]
### the proportion of DEGs
table(gene_info$de_gene)[2]/nrow(result) ## de.prob = 0.2
#    yes 
# 0.1895