Here Lun2 method will be demonstrated clearly and hope that this document can help you.
Before simulating datasets, it is important to estimate some essential parameters from a real dataset in order to make the simulated data more real. If you do not have a single-cell transcriptomics count matrix now, you can use the data collected in simmethods package by simmethods:data
command.
When you use Lun2 to estimate parameters from a real dataset, you must input a numeric vector to specify the batches or plates that each cell comes from, like other_prior = list(batch.condition = the numeric vector)
.
library(simmethods)
library(SingleCellExperiment)
# Load data
ref_data <- simmethods::data
plates <- simmethods::group_condition
## plates can must be a numeric vector.
other_prior <- list(batch.condition = as.numeric(plates))
Using simmethods::Lun2_estimation
command to execute the estimation step.
estimate_result <- simmethods::Lun2_estimation(ref_data = ref_data,
other_prior = other_prior,
verbose = T,
seed = 10)
# Estimating parameters using Lun2
# Estimating number of groups...
# Warning in regularize.values(x, y, ties, missing(ties), na.rm = na.rm):
# collapsing to unique 'x' values
# Computing normalisation factors...
# Warning in (function (x, sizes, min.mean = NULL, positive = FALSE, scaling =
# NULL) : encountered non-positive size factor estimates
# Estimating dispersions...
# Estimating gene means...
# Estimating plate effects...
# Estimating zero-inflated parameters...
# Warning in sqrt(diag(vc)[np]): NaNs produced
# Warning in sqrt(diag(vc)[np]): NaNs produced
# Warning in value[[3L]](cond): system is computationally singular: reciprocal
# condition number = 9.2681e-48FALSE
# Warning in sqrt(diag(vc)[np]): NaNs produced
# Warning in sqrt(diag(vc)[np]): NaNs produced
# Warning in sqrt(diag(vc)[np]): NaNs produced
# Warning in sqrt(diag(vc)[np]): NaNs produced
# Warning in sqrt(diag(vc)[np]): NaNs produced
# Warning in sqrt(diag(vc)[np]): NaNs produced
# Warning in value[[3L]](cond): system is computationally singular: reciprocal
# condition number = 4.1192e-48FALSE
After estimating parameter from a real dataset, we will simulate a dataset based on the learned parameters with different scenarios.
The reference data contains 160 cells and 4000 genes, if we simulate datasets with default parameters and then we will obtain a new data which has the same size as the reference data. In addtion, the simulated dataset will have one batch of cells.
simulate_result <- simmethods::Lun2_simulation(
parameters = estimate_result[["estimate_result"]],
return_format = "SCE",
seed = 111
)
# nCells: 160
# nGenes: 3973
# nPlates: 2
SCE_result <- simulate_result[["simulate_result"]]
dim(SCE_result)
# [1] 3973 160
head(colData(SCE_result))
# DataFrame with 6 rows and 2 columns
# cell_name plate
# <character> <factor>
# Cell1 Cell1 1
# Cell2 Cell2 1
# Cell3 Cell3 1
# Cell4 Cell4 1
# Cell5 Cell5 1
# Cell6 Cell6 1
head(rowData(SCE_result))
# DataFrame with 6 rows and 1 column
# gene_name
# <character>
# Gene1 Gene1
# Gene2 Gene2
# Gene3 Gene3
# Gene4 Gene4
# Gene5 Gene5
# Gene6 Gene6
In Lun2, we can not set nCells
directly and should set cell.plates
instead. For example, if we want to simulate 500 cells, we can type other_prior = list(cell.plates = rep(1, 500))
. For genes, we can just set nGenes
.
Here, we simulate a new dataset with 500 cells and 2000 genes:
simulate_result <- simmethods::Lun2_simulation(
parameters = estimate_result[["estimate_result"]],
return_format = "list",
other_prior = list(cell.plates = rep(1, 500),
nGenes = 2000),
seed = 111
)
# nCells: 500
# nGenes: 2000
# nPlates: 1
# Warning in splatter::lun2Simulate(parameters, verbose = verbose, zinb = zinb):
# Number of lib.sizes not equal to nCells. lib.sizes will be sampled.
# Warning in splatter::lun2Simulate(parameters, verbose = verbose, zinb = zinb):
# Number of gene parameters does not equal nGenes. Gene parameters will be
# sampled.
The cell.plates parameter represents the sampling source of cells in real experiments.
result <- simulate_result[["simulate_result"]][["count_data"]]
dim(result)
# [1] 2000 500
In Lun2, we can not set nbatches
directly and should set cell.plates
instead. For example, if we want to simulate 2 batches, we can type other_prior = list(cell.plates = sample(1:2, 500, replace = TRUE))
. Note that the length of cell.plates
numeric vector must be equal to the cell number.
For demonstration, we will simulate three batches using the learned parameters.
simulate_result <- simmethods::Lun2_simulation(
parameters = estimate_result[["estimate_result"]],
return_format = "list",
other_prior = list(cell.plates = sample(1:2, 500, replace = TRUE),
nGenes = 2000),
seed = 111
)
# nCells: 500
# nGenes: 2000
# nPlates: 2
# Warning in splatter::lun2Simulate(parameters, verbose = verbose, zinb = zinb):
# Number of lib.sizes not equal to nCells. lib.sizes will be sampled.
# Warning in splatter::lun2Simulate(parameters, verbose = verbose, zinb = zinb):
# Number of gene parameters does not equal nGenes. Gene parameters will be
# sampled.
result <- simulate_result[["simulate_result"]][["count_data"]]
dim(result)
# [1] 2000 500
## cell information
cell_info <- simulate_result[["simulate_result"]][["col_meta"]]
table(cell_info$plate)
#
# 1 2
# 236 264