zinbwaveZinger

Here zinbwaveZinger method will be demonstrated clearly and hope that this document can help you.

Estimating parameters from a real dataset

Before simulating datasets, it is important to estimate some essential parameters from a real dataset in order to make the simulated data more real. If you do not have a single-cell transcriptomics count matrix now, you can use the data collected in simmethods package by simmethods:data command.

When you use zinbwaveZinger to estimate parameters from a real dataset, you must input a numeric vector to specify the groups or plates that each cell comes from, like other_prior = list(group.condition = the numeric vector).

library(simmethods)
library(SingleCellExperiment)
# Load data
ref_data <- simmethods::data
group_condition <- simmethods::group_condition
## group_condition can must be a numeric vector.
other_prior <- list(group.condition = as.numeric(group_condition))

Using simmethods::zinbwaveZinger_estimation command to execute the estimation step.

estimate_result <- simmethods::zinbwaveZinger_estimation(ref_data = ref_data,
                                                         other_prior = other_prior,
                                                         verbose = T,
                                                         seed = 10)
# Estimating parameters using zinbwaveZinger
# iteration 1 in 10
# iteration 2 in 10
# iteration 3 in 10
# iteration 4 in 10
# iteration 5 in 10
# iteration 6 in 10
# iteration 7 in 10
# iteration 8 in 10
# iteration 9 in 10
# iteration 10 in 10

Simulating datasets using zinbwaveZinger

After estimating parameter from a real dataset, we will simulate a dataset based on the learned parameters with different scenarios.

  1. Datasets with default parameters
  2. Determin the number of cells and genes
  3. Simulate two groups

Datasets with default parameters

The reference data contains 160 cells and 4000 genes, if we simulate datasets with default parameters and then we will obtain a new data which has the same size as the reference data. In addtion, the simulated dataset will have one group of cells.

simulate_result <- simmethods::zinbwaveZinger_simulation(
  ref_data = ref_data,
  other_prior = other_prior,
  parameters = estimate_result[["estimate_result"]],
  return_format = "SCE",
  seed = 111
)
# nCells: 160
# nGenes: 4000
# nGroups: 2
# prob.group: 0.1
# fc.group: 2
# Preparing dataset.
# Sampling.
# Calculating differential expression.
# Simulating data.
# Adding extra zeros w.r.t. NB for 2366 genes
SCE_result <- simulate_result[["simulate_result"]]
dim(SCE_result)
# [1] 4000  160
head(colData(SCE_result))
# DataFrame with 6 rows and 1 column
#         cell_name
#       <character>
# Cell1       Cell1
# Cell2       Cell2
# Cell3       Cell3
# Cell4       Cell4
# Cell5       Cell5
# Cell6       Cell6
head(rowData(SCE_result))
# DataFrame with 6 rows and 3 columns
#         gene_name     de_gene     de_fc
#       <character> <character> <numeric>
# Gene1       Gene1          no         0
# Gene2       Gene2          no         0
# Gene3       Gene3          no         0
# Gene4       Gene4          no         0
# Gene5       Gene5          no         0
# Gene6       Gene6          no         0

Determin the number of cells and genes

In zinbwaveZinger, users can only set the number of cells and genes which is higher than the reference data. Here, we simulate a new dataset with 1000 cells and 5000 genes:

simulate_result <- simmethods::zinbwaveZinger_simulation(
  ref_data = ref_data,
  other_prior = list(group.condition = as.numeric(group_condition),
                     nCells = 1000,
                     nGenes = 5000),
  parameters = estimate_result[["estimate_result"]],
  return_format = "list",
  seed = 111
)
# nCells: 1000
# nGenes: 5000
# nGroups: 2
# prob.group: 0.1
# fc.group: 2
# Preparing dataset.
# Sampling.
# Calculating differential expression.
# Simulating data.
# Adding extra zeros w.r.t. NB for 2776 genes
result <- simulate_result[["simulate_result"]][["count_data"]]
dim(result)
# [1] 5000 1000

Simulate two groups

In zinbwaveZinger, we can only simulate two groups and note that zinbwaveZinger dose not return cell group information.

For demonstration, we will simulate two groups using the learned parameters. We can set de.prob = 0.2 to simulate 20% genes as DEGs.

simulate_result <- simmethods::zinbwaveZinger_simulation(
  ref_data = ref_data,
  other_prior = list(group.condition = as.numeric(group_condition),
                     nCells = 1000,
                     nGenes = 5000,
                     de.prob = 0.2,
                     fc.group = 4),
  parameters = estimate_result[["estimate_result"]],
  return_format = "list",
  seed = 111
)
# nCells: 1000
# nGenes: 5000
# nGroups: 2
# prob.group: 0.2
# fc.group: 4
# Preparing dataset.
# Sampling.
# Calculating differential expression.
# Simulating data.
# Adding extra zeros w.r.t. NB for 2640 genes

zinbwaveZinger dose not return cell group information.

result <- simulate_result[["simulate_result"]][["count_data"]]
dim(result)
# [1] 5000 1000
## gene information
gene_info <- simulate_result[["simulate_result"]][["row_meta"]]
### the proportion of DEGs
table(gene_info$de_gene)[2]/nrow(result) ## de.prob = 0.2
# yes 
# 0.2