Here Kersplat method will be demonstrated clearly and hope that this document can help you.
Before simulating datasets, it is important to estimate some essential parameters from a real dataset in order to make the simulated data more real. If you do not have a single-cell transcriptomics count matrix now, you can use the data collected in simmethods package by simmethods:data
command.
library(simmethods)
library(SingleCellExperiment)
# Load data
ref_data <- simmethods::data
dim(ref_data)
# [1] 4000 160
Using simmethods::Kersplat_estimation
command to execute the estimation step.
estimate_result <- simmethods::Kersplat_estimation(ref_data = ref_data,
verbose = T,
seed = 10)
# Estimating parameters using Kersplat
# Warning in newKersplatParams(): The Kersplat simulation is still experimental
# and may produce unreliable results. Please try it and report any issues to
# https://github.com/Oshlack/splatter/issues. The development version may have
# improved features.
# Raw: 0.180467969462491 A: 5.24644589782513 B: 1.56749149061734 C: -3.99991188833989 Y: 0.796802242740237
# Warning in kersplatEstBCV(counts, params, verbose): Exponential corrected BCV is
# negative.Using linear correction.
# Warning in kersplatEstBCV(counts, params, verbose): Linear corrected BCV is
# negative.Using existing bcv.common.
After estimating parameter from a real dataset, we will simulate a dataset based on the learned parameters with different scenarios.
The reference data contains 160 cells and 4000 genes, if we simulate datasets with default parameters and then we will obtain a new data which has the same size as the reference data.
simulate_result <- simmethods::Kersplat_simulation(
parameters = estimate_result[["estimate_result"]],
return_format = "SCE",
seed = 111
)
# nCells: 160
# nGenes: 4000
SCE_result <- simulate_result[["simulate_result"]]
dim(SCE_result)
# [1] 4000 160
head(colData(SCE_result))
# DataFrame with 6 rows and 1 column
# cell_name
# <character>
# Cell1 Cell1
# Cell2 Cell2
# Cell3 Cell3
# Cell4 Cell4
# Cell5 Cell5
# Cell6 Cell6
Here, we simulate a new dataset with 500 cells and 1000 genes:
simulate_result <- simmethods::Kersplat_simulation(
parameters = estimate_result[["estimate_result"]],
return_format = "list",
other_prior = list(nCells = 500,
nGenes = 1000),
seed = 111
)
# nCells: 500
# nGenes: 1000
result <- simulate_result[["simulate_result"]][["count_data"]]
dim(result)
# [1] 1000 500