TedSim

Here TedSim method will be demonstrated clearly and hope that this document can help you.

Estimating parameters from a real dataset

Before simulating datasets, it is important to estimate some essential parameters from a real dataset in order to make the simulated data more real.

library(simmethods)
library(SingleCellExperiment)
# Load data
ref_data <- simmethods::data
estimate_result <- simmethods::TedSim_estimation(
  ref_data = ref_data,
  verbose = T,
  seed = 10
)
# The number of cells is not the power of 2, and we will synthesize some extra cells base on your data...
# Performing k-means and determin the best number of clusters...
# Add grouping to data...
# Synthesize fake cells...
# Add the synthesized data to the real data...
# Done
# Loading required package: amap
# Estimating parameters using TedSim

TedSim can only simulate the dataset where the cell number is the power of 2, so if the reference data does not meet the requirement, the procedure will synthesize extra fake cells to achive this goal.

Users can also input the group information of cells and the k-means will not be used:

group <- as.numeric(simmethods::group_condition)
estimate_result <- simmethods::TedSim_estimation(
  ref_data = ref_data,
  other_prior = list(group.condition = group),
  verbose = T,
  seed = 10
)
# The number of cells is not the power of 2, and we will synthesize some extra cells base on your data...
# Add grouping to data...
# Synthesize fake cells...
# Add the synthesized data to the real data...
# Done
# Estimating parameters using TedSim

Simulating datasets with cell trajectory using TedSim

After estimating parameter from a real dataset, we will simulate a dataset based on the learned parameters with different scenarios.

  1. Datasets with default parameters
  2. Determin the number of genes
  3. Visualization

Datasets with default parameters

The reference data contains 256 cells (160 real cells and 96 fake cells) and 4000 genes, if we simulate datasets with default parameters and then we will obtain a new data which has the same size as the reference data.

simulate_result <- simmethods::TedSim_simulation(
  parameters = estimate_result[["estimate_result"]],
  other_prior = NULL,
  return_format = "SCE",
  seed = 111
)
# nCells: 256
# nGenes: 4000
# Warning in cbind(...): number of rows of result is not a multiple of vector
# length (arg 3)
SCE_result <- simulate_result[["simulate_result"]]
dim(SCE_result)
# [1] 4000  256

Determin the number of cells and genes

In TedSim, we can set nGenes to specify the number of genes.

Here, we simulate a new dataset with 1000 genes:

simulate_result <- simmethods::TedSim_simulation(
  parameters = estimate_result[["estimate_result"]],
  return_format = "list",
  other_prior = list(nGenes = 1000),
  seed = 111
)
# nCells: 256
# nGenes: 1000
result <- simulate_result[["simulate_result"]][["count_data"]]
dim(result)
# [1] 1000  256

Visualization

Make sure that you have already installed several R packages:

if(!requireNamespace("dynwrap", quietly = TRUE)){install.packages("dynwrap")}
if(!requireNamespace("dyndimred", quietly = TRUE)){install.packages("dyndimred")}
if(!requireNamespace("dynplot", quietly = TRUE)){install.packages("dynplot")}
if(!requireNamespace("tislingshot", quietly = TRUE)){devtools::install_github("dynverse/ti_slingshot/package/")}

First we should wrap the data into a standard object:

dyn_object <- dynwrap::wrap_expression(counts = t(result),
                                       expression = log2(t(result) + 1))

Next, we infer the trajectory using SlingShot which has been proved to be the most best method to do this:

model <- dynwrap::infer_trajectory(dataset = dyn_object,
                                   method = tislingshot::ti_slingshot(),
                                   parameters = NULL,
                                   give_priors = NULL,
                                   seed = 111,
                                   verbose = TRUE)
# Executing 'slingshot' on '20230816_112206__data_wrapper__RdESls1DEF'
# With parameters: list(cluster_method = "pam", ndim = 20L, shrink = 1L, reweight = TRUE,     reassign = TRUE, thresh = 0.001, maxit = 10L, stretch = 2L,     smoother = "smooth.spline", shrink.method = "cosine")
# inputs: expression
# priors :
# Using full covariance matrix

Finally, we can plot the trajectory after performing dimensionality reduction:

dimred <- dyndimred::dimred_umap(dyn_object$expression)
dynplot::plot_dimred(model, dimred = dimred)
# Coloring by milestone
# Using milestone_percentages from trajectory

For more details about trajectory inference and visualization, please check dynverse.