Kersplat

Here Kersplat method will be demonstrated clearly and hope that this document can help you.

Estimating parameters from a real dataset

Before simulating datasets, it is important to estimate some essential parameters from a real dataset in order to make the simulated data more real. If you do not have a single-cell transcriptomics count matrix now, you can use the data collected in simmethods package by simmethods:data command.

library(simmethods)
library(SingleCellExperiment)
# Load data
ref_data <- simmethods::data
dim(ref_data)
# [1] 4000  160

Using simmethods::Kersplat_estimation command to execute the estimation step.

estimate_result <- simmethods::Kersplat_estimation(ref_data = ref_data,
                                                   verbose = T,
                                                   seed = 10)
# Estimating parameters using Kersplat
# Warning in newKersplatParams(): The Kersplat simulation is still experimental
# and may produce unreliable results. Please try it and report any issues to
# https://github.com/Oshlack/splatter/issues. The development version may have
# improved features.
# Raw: 0.180467969462491 A: 5.24644589782513 B: 1.56749149061734 C: -3.99991188833989 Y: 0.796802242740237
# Warning in kersplatEstBCV(counts, params, verbose): Exponential corrected BCV is
# negative.Using linear correction.
# Warning in kersplatEstBCV(counts, params, verbose): Linear corrected BCV is
# negative.Using existing bcv.common.

Simulating datasets using Kersplat

After estimating parameter from a real dataset, we will simulate a dataset based on the learned parameters with different scenarios.

  1. Datasets with default parameters
  2. Determin the number of cells and genes

Datasets with default parameters

The reference data contains 160 cells and 4000 genes, if we simulate datasets with default parameters and then we will obtain a new data which has the same size as the reference data.

simulate_result <- simmethods::Kersplat_simulation(
  parameters = estimate_result[["estimate_result"]],
  return_format = "SCE",
  seed = 111
)
# nCells: 160
# nGenes: 4000
SCE_result <- simulate_result[["simulate_result"]]
dim(SCE_result)
# [1] 4000  160
head(colData(SCE_result))
# DataFrame with 6 rows and 1 column
#         cell_name
#       <character>
# Cell1       Cell1
# Cell2       Cell2
# Cell3       Cell3
# Cell4       Cell4
# Cell5       Cell5
# Cell6       Cell6

Determin the number of cells and genes

Here, we simulate a new dataset with 500 cells and 1000 genes:

simulate_result <- simmethods::Kersplat_simulation(
  parameters = estimate_result[["estimate_result"]],
  return_format = "list",
  other_prior = list(nCells = 500,
                     nGenes = 1000),
  seed = 111
)
# nCells: 500
# nGenes: 1000
result <- simulate_result[["simulate_result"]][["count_data"]]
dim(result)
# [1] 1000  500