Lun2

Here Lun2 method will be demonstrated clearly and hope that this document can help you.

Estimating parameters from a real dataset

Before simulating datasets, it is important to estimate some essential parameters from a real dataset in order to make the simulated data more real. If you do not have a single-cell transcriptomics count matrix now, you can use the data collected in simmethods package by simmethods:data command.

When you use Lun2 to estimate parameters from a real dataset, you must input a numeric vector to specify the batches or plates that each cell comes from, like other_prior = list(batch.condition = the numeric vector).

library(simmethods)
library(SingleCellExperiment)
# Load data
ref_data <- simmethods::data
plates <- simmethods::group_condition
## plates can must be a numeric vector.
other_prior <- list(batch.condition = as.numeric(plates))

Using simmethods::Lun2_estimation command to execute the estimation step.

estimate_result <- simmethods::Lun2_estimation(ref_data = ref_data,
                                               other_prior = other_prior,
                                               verbose = T,
                                               seed = 10)
# Estimating parameters using Lun2
# Estimating number of groups...
# Warning in regularize.values(x, y, ties, missing(ties), na.rm = na.rm):
# collapsing to unique 'x' values
# Computing normalisation factors...
# Warning in (function (x, sizes, min.mean = NULL, positive = FALSE, scaling =
# NULL) : encountered non-positive size factor estimates
# Estimating dispersions...
# Estimating gene means...
# Estimating plate effects...
# Estimating zero-inflated parameters...
# Warning in sqrt(diag(vc)[np]): NaNs produced
# Warning in sqrt(diag(vc)[np]): NaNs produced
# Warning in value[[3L]](cond): system is computationally singular: reciprocal
# condition number = 9.2681e-48FALSE
# Warning in sqrt(diag(vc)[np]): NaNs produced

# Warning in sqrt(diag(vc)[np]): NaNs produced

# Warning in sqrt(diag(vc)[np]): NaNs produced

# Warning in sqrt(diag(vc)[np]): NaNs produced

# Warning in sqrt(diag(vc)[np]): NaNs produced

# Warning in sqrt(diag(vc)[np]): NaNs produced
# Warning in value[[3L]](cond): system is computationally singular: reciprocal
# condition number = 4.1192e-48FALSE

Simulating datasets using Lun2

After estimating parameter from a real dataset, we will simulate a dataset based on the learned parameters with different scenarios.

  1. Datasets with default parameters
  2. Determin the number of cells and genes
  3. Simulate two or more batches

Datasets with default parameters

The reference data contains 160 cells and 4000 genes, if we simulate datasets with default parameters and then we will obtain a new data which has the same size as the reference data. In addtion, the simulated dataset will have one batch of cells.

simulate_result <- simmethods::Lun2_simulation(
  parameters = estimate_result[["estimate_result"]],
  return_format = "SCE",
  seed = 111
)
# nCells: 160
# nGenes: 3973
# nPlates: 2
SCE_result <- simulate_result[["simulate_result"]]
dim(SCE_result)
# [1] 3973  160
head(colData(SCE_result))
# DataFrame with 6 rows and 2 columns
#         cell_name    plate
#       <character> <factor>
# Cell1       Cell1        1
# Cell2       Cell2        1
# Cell3       Cell3        1
# Cell4       Cell4        1
# Cell5       Cell5        1
# Cell6       Cell6        1
head(rowData(SCE_result))
# DataFrame with 6 rows and 1 column
#         gene_name
#       <character>
# Gene1       Gene1
# Gene2       Gene2
# Gene3       Gene3
# Gene4       Gene4
# Gene5       Gene5
# Gene6       Gene6

Determin the number of cells and genes

In Lun2, we can not set nCells directly and should set cell.plates instead. For example, if we want to simulate 500 cells, we can type other_prior = list(cell.plates = rep(1, 500)). For genes, we can just set nGenes. Here, we simulate a new dataset with 500 cells and 2000 genes:

simulate_result <- simmethods::Lun2_simulation(
  parameters = estimate_result[["estimate_result"]],
  return_format = "list",
  other_prior = list(cell.plates = rep(1, 500),
                     nGenes = 2000),
  seed = 111
)
# nCells: 500
# nGenes: 2000
# nPlates: 1
# Warning in splatter::lun2Simulate(parameters, verbose = verbose, zinb = zinb):
# Number of lib.sizes not equal to nCells. lib.sizes will be sampled.
# Warning in splatter::lun2Simulate(parameters, verbose = verbose, zinb = zinb):
# Number of gene parameters does not equal nGenes. Gene parameters will be
# sampled.

The cell.plates parameter represents the sampling source of cells in real experiments.

result <- simulate_result[["simulate_result"]][["count_data"]]
dim(result)
# [1] 2000  500

Simulate two or more batches

In Lun2, we can not set nbatches directly and should set cell.plates instead. For example, if we want to simulate 2 batches, we can type other_prior = list(cell.plates = sample(1:2, 500, replace = TRUE)). Note that the length of cell.plates numeric vector must be equal to the cell number.

For demonstration, we will simulate three batches using the learned parameters.

simulate_result <- simmethods::Lun2_simulation(
  parameters = estimate_result[["estimate_result"]],
  return_format = "list",
  other_prior = list(cell.plates = sample(1:2, 500, replace = TRUE),
                     nGenes = 2000),
  seed = 111
)
# nCells: 500
# nGenes: 2000
# nPlates: 2
# Warning in splatter::lun2Simulate(parameters, verbose = verbose, zinb = zinb):
# Number of lib.sizes not equal to nCells. lib.sizes will be sampled.
# Warning in splatter::lun2Simulate(parameters, verbose = verbose, zinb = zinb):
# Number of gene parameters does not equal nGenes. Gene parameters will be
# sampled.
result <- simulate_result[["simulate_result"]][["count_data"]]
dim(result)
# [1] 2000  500
## cell information
cell_info <- simulate_result[["simulate_result"]][["col_meta"]]
table(cell_info$plate)
# 
#   1   2 
# 236 264