zinbwave

Here zinbwave method will be demonstrated clearly and hope that this document can help you.

Estimating parameters from a real dataset

Before simulating datasets, it is important to estimate some essential parameters from a real dataset in order to make the simulated data more real. If you do not have a single-cell transcriptomics count matrix now, you can use the data collected in simmethods package by simmethods:data command.

library(simmethods)
library(SingleCellExperiment)
# Load data
ref_data <- simmethods::data

Using simmethods::zinbwave_estimation command to execute the estimation step.

estimate_result <- simmethods::zinbwave_estimation(ref_data = ref_data,
                                                   verbose = T,
                                                   seed = 10)
# Estimating parameters using zinbwave
# Removing all zero genes...
# Fitting model...
# Create model:
# ok
# Initialize parameters:
# ok
# Optimize parameters:
# Iteration 1
# penalized log-likelihood = -1458528.51195789
# After dispersion optimization = -1842970.76424042
#    user  system elapsed 
#   6.818   0.562   7.599
# After right optimization = -1681437.25066416
# After orthogonalization = -1681437.25066416
#    user  system elapsed 
#   2.545   0.188   2.802
# After left optimization = -1615279.08884363
# After orthogonalization = -1615279.08884363
# Iteration 2
# penalized log-likelihood = -1615279.08884363
# After dispersion optimization = -1615279.08884363
#    user  system elapsed 
#   5.128   0.413   5.608
# After right optimization = -1613680.86211755
# After orthogonalization = -1613680.86211755
#    user  system elapsed 
#   2.016   0.158   2.199
# After left optimization = -1613424.05613962
# After orthogonalization = -1613424.05613962
# Iteration 3
# penalized log-likelihood = -1613424.05613962
# After dispersion optimization = -1613424.05613962
#    user  system elapsed 
#   4.531   0.374   4.962
# After right optimization = -1613329.33593763
# After orthogonalization = -1613329.33593763
#    user  system elapsed 
#   1.262   0.077   1.348
# After left optimization = -1613299.28560498
# After orthogonalization = -1613299.28560498
# Iteration 4
# penalized log-likelihood = -1613299.28560498
# ok

Simulating datasets using zinbwave

After estimating parameter from a real dataset, we will simulate a dataset based on the learned parameters.

The reference data contains 160 cells and 4000 genes, we can only simulate datasets with default parameters in zinbwave and then we will obtain a new data which has the same size as the reference data.

simulate_result <- simmethods::zinbwave_simulation(
  parameters = estimate_result[["estimate_result"]],
  return_format = "SCE",
  seed = 111
)
# nCells: 160
# nGenes: 4000
SCE_result <- simulate_result[["simulate_result"]]
dim(SCE_result)
# [1] 4000  160
head(colData(SCE_result))
# DataFrame with 6 rows and 1 column
#              Cell
#       <character>
# Cell1       Cell1
# Cell2       Cell2
# Cell3       Cell3
# Cell4       Cell4
# Cell5       Cell5
# Cell6       Cell6
head(rowData(SCE_result))
# DataFrame with 6 rows and 1 column
#              Gene
#       <character>
# Gene1       Gene1
# Gene2       Gene2
# Gene3       Gene3
# Gene4       Gene4
# Gene5       Gene5
# Gene6       Gene6