We have already known how to estimate parameters from one or more real datasets and got the estimation results. In this chapter, we will demonstrate how to simulate single-cell transcriptomics data based on the previous estimation results, especially the useful parameters that are usually customized to satisfy the different application situations.
For demonstrations, we use Splat method as it contains all functionalities and available parameters that we want to introduce.
Library our packages first:
library(simmethods)
library(simpipe)
Load data and perform estimation:
ref_data <- simmethods::data
estimation_result <- simmethods::Splat_estimation(
ref_data = ref_data,
verbose = TRUE,
seed = 666
)
# Estimating parameters using Splat
Next, check the optional parameters that control the size of the simulated datasets, the proportion of DEGs, the number of cell batches and datasets with cellular trajectory. In this way, you will know the essential parameters that may satisfy your simulation requirements.
help(SplatPop_simulation)
## Details
# In addtion to simulate datasets with default parameters, users want to simulate other kinds of datasets, e.g. a counts matrix with 2 or more cell groups. In Splat, you can set extra parameters to simulate datasets.
#
# The customed parameters you can set are below:
#
# nCells. In Splat, you can not set nCells directly and should set batchCells instead. For example, if you want to simulate 1000 cells, you can type other_prior = list(batchCells = 1000). If you type other_prior = list(batchCells = c(500, 500)), the simulated data will have two batches.
#
# nGenes. You can directly set other_prior = list(nGenes = 5000) to simulate 5000 genes.
#
# nGroups. You can not directly set other_prior = list(nGroups = 3) to simulate 3 groups. Instead, you should set other_prior = list(prob.group = c(0.2, 0.3, 0.5)) where the sum of group probabilities must equal to 1.
#
# de.prob. You can directly set other_prior = list(de.prob = 0.2) to simulate DEGs that account for 20 percent of all genes.
#
# prob.group. You can directly set other_prior = list(prob.group = c(0.2, 0.3, 0.5)) to assign three proportions of cell groups. Note that the number of groups always equals to the length of the vector.
#
# nBatches. You can not directly set other_prior = list(nBatches = 3) to simulate 3 batches. Instead, you should set other_prior = list(batchCells = c(500, 500, 500)) to reach the goal and the total cells are 1500.
#
# If users want to simulate datasets for trajectory inference, just set other_prior = list(paths = TRUE). Simulating trajectory datasets can also specify the parameters of group and batch. See Examples.
These parameters can be categorized into 4 classes and respectively represent the main four functionalities in Splat method:
parameters for cell groups
parameters for DEGs
parameters for batches
parameters for cellular differentiation trajectory
In the next part of step3, we will describe these application situations in detail.
The first application situation is generating datasets with different number of cells and genes. After browsing the vignettes of Splat method, we know that batchCells
parameter controls the number of cells and nGenes
controls the number of genes.
Simulate 1000
cells and 5000
genes:
data_1000_5000 <- simmethods::Splat_simulation(
parameters = estimation_result$estimate_result,
other_prior = list(batchCells = 1000,
nGenes = 5000),
return_format = "Seurat",
verbose = TRUE,
seed = 666
)
# nCells: 1000
# nGenes: 5000
# nGroups: 1
# de.prob: 0.1
# nBatches: 1
# Simulating datasets using Splat
# Getting parameters...
# Creating simulation object...
# Simulating library sizes...
# Simulating gene means...
# Simulating BCV...
# Simulating counts...
# Simulating dropout (if needed)...
# Sparsifying assays...
# Automatically converting to sparse matrices, threshold = 0.95
# Skipping 'BatchCellMeans': estimated sparse size 1.5 * dense matrix
# Skipping 'BaseCellMeans': estimated sparse size 1.5 * dense matrix
# Skipping 'BCV': estimated sparse size 1.5 * dense matrix
# Skipping 'CellMeans': estimated sparse size 1.49 * dense matrix
# Skipping 'TrueCounts': estimated sparse size 1.6 * dense matrix
# Skipping 'counts': estimated sparse size 1.6 * dense matrix
# Done!
data_1000_5000$simulate_result
# An object of class Seurat
# 5000 features across 1000 samples within 1 assay
# Active assay: originalexp (5000 features, 0 variable features)
Simulate 10000
cells and 20000
genes:
data_10000_20000 <- simmethods::Splat_simulation(
parameters = estimation_result$estimate_result,
other_prior = list(batchCells = 10000,
nGenes = 20000),
return_format = "Seurat",
verbose = TRUE,
seed = 666
)
# nCells: 10000
# nGenes: 20000
# nGroups: 1
# de.prob: 0.1
# nBatches: 1
# Simulating datasets using Splat
# Getting parameters...
# Creating simulation object...
# Simulating library sizes...
# Simulating gene means...
# Simulating BCV...
# Simulating counts...
# Simulating dropout (if needed)...
# Sparsifying assays...
# Automatically converting to sparse matrices, threshold = 0.95
# Skipping 'BatchCellMeans': estimated sparse size 1.5 * dense matrix
# Skipping 'BaseCellMeans': estimated sparse size 1.5 * dense matrix
# Skipping 'BCV': estimated sparse size 1.5 * dense matrix
# Skipping 'CellMeans': estimated sparse size 1.47 * dense matrix
# Skipping 'TrueCounts': estimated sparse size 1.06 * dense matrix
# Skipping 'counts': estimated sparse size 1.06 * dense matrix
# Done!
See the number of cells and genes
data_10000_20000$simulate_result
# An object of class Seurat
# 20000 features across 10000 samples within 1 assay
# Active assay: originalexp (20000 features, 0 variable features)
Check the execution time:
data_10000_20000$simulate_detection$Elapsed_Time_sec
# [1] 44.628
If we want to simulate two groups of cells using Splat method, we can use prob.group
parameter to specify the proportions of cells in two groups. The length of prob.group
vector defines the number of groups.
Simulate two groups (4:6):
data_4_6 <- simmethods::Splat_simulation(
parameters = estimation_result$estimate_result,
other_prior = list(batchCells = 1000,
nGenes = 5000,
prob.group = c(0.4, 0.6)),
return_format = "Seurat",
verbose = TRUE,
seed = 666
)
# nCells: 1000
# nGenes: 5000
# nGroups: 2
# de.prob: 0.1
# nBatches: 1
# Simulating datasets using Splat
# Getting parameters...
# Creating simulation object...
# Simulating library sizes...
# Simulating gene means...
# Simulating group DE...
# Simulating cell means...
# Simulating BCV...
# Simulating counts...
# Simulating dropout (if needed)...
# Sparsifying assays...
# Automatically converting to sparse matrices, threshold = 0.95
# Skipping 'BatchCellMeans': estimated sparse size 1.5 * dense matrix
# Skipping 'BaseCellMeans': estimated sparse size 1.5 * dense matrix
# Skipping 'BCV': estimated sparse size 1.5 * dense matrix
# Skipping 'CellMeans': estimated sparse size 1.49 * dense matrix
# Skipping 'TrueCounts': estimated sparse size 1.6 * dense matrix
# Skipping 'counts': estimated sparse size 1.6 * dense matrix
# Done!
Check group labels of cells
table(data_4_6$simulate_result$group)
#
# Group1 Group2
# 407 593
Simulate five groups (1:1:2:3:3):
data_11233 <- simmethods::Splat_simulation(
parameters = estimation_result$estimate_result,
other_prior = list(batchCells = 1000,
nGenes = 5000,
prob.group = c(0.1, 0.1, 0.2, 0.3, 0.3)),
return_format = "Seurat",
verbose = TRUE,
seed = 666
)
# nCells: 1000
# nGenes: 5000
# nGroups: 5
# de.prob: 0.1
# nBatches: 1
# Simulating datasets using Splat
# Getting parameters...
# Creating simulation object...
# Simulating library sizes...
# Simulating gene means...
# Simulating group DE...
# Simulating cell means...
# Simulating BCV...
# Simulating counts...
# Simulating dropout (if needed)...
# Sparsifying assays...
# Automatically converting to sparse matrices, threshold = 0.95
# Skipping 'BatchCellMeans': estimated sparse size 1.5 * dense matrix
# Skipping 'BaseCellMeans': estimated sparse size 1.5 * dense matrix
# Skipping 'BCV': estimated sparse size 1.5 * dense matrix
# Skipping 'CellMeans': estimated sparse size 1.49 * dense matrix
# Skipping 'TrueCounts': estimated sparse size 1.6 * dense matrix
# Skipping 'counts': estimated sparse size 1.6 * dense matrix
# Done!
Check group labels of cells
table(data_11233$simulate_result$group)
#
# Group1 Group2 Group3 Group4 Group5
# 95 106 206 290 303
Users can also set the proportion of DEGs in Splat method via de.prob
parameter which ranges from 0 to 1.
Here we set de.prob
as 0.2 to simulate 20% DEGs in two cell groups.
simulated_data <- simmethods::Splat_simulation(
parameters = estimation_result$estimate_result,
other_prior = list(batchCells = 1000,
nGenes = 5000,
prob.group = c(0.4, 0.6),
de.prob = 0.2),
return_format = "list",
verbose = TRUE,
seed = 666
)
# nCells: 1000
# nGenes: 5000
# nGroups: 2
# de.prob: 0.2
# nBatches: 1
# Simulating datasets using Splat
# Getting parameters...
# Creating simulation object...
# Simulating library sizes...
# Simulating gene means...
# Simulating group DE...
# Simulating cell means...
# Simulating BCV...
# Simulating counts...
# Simulating dropout (if needed)...
# Sparsifying assays...
# Automatically converting to sparse matrices, threshold = 0.95
# Skipping 'BatchCellMeans': estimated sparse size 1.5 * dense matrix
# Skipping 'BaseCellMeans': estimated sparse size 1.5 * dense matrix
# Skipping 'BCV': estimated sparse size 1.5 * dense matrix
# Skipping 'CellMeans': estimated sparse size 1.49 * dense matrix
# Skipping 'TrueCounts': estimated sparse size 1.6 * dense matrix
# Skipping 'counts': estimated sparse size 1.6 * dense matrix
# Done!
Check group labels of cells
table(simulated_data$simulate_result$col_meta$group)
#
# Group1 Group2
# 407 593
Check the proportion of DEGs
row_meta <- simulated_data$simulate_result$row_meta
table(row_meta$de_gene == "yes")/length(row_meta$de_gene)
#
# FALSE TRUE
# 0.8068 0.1932
We then simulate another dataset which contains more than 2 groups (4 groups and 40% DEGs):
simulated_data <- simmethods::Splat_simulation(
parameters = estimation_result$estimate_result,
other_prior = list(batchCells = 1000,
nGenes = 5000,
prob.group = c(0.2, 0.2, 0.3, 0.3),
de.prob = 0.4),
return_format = "list",
verbose = TRUE,
seed = 666
)
# nCells: 1000
# nGenes: 5000
# nGroups: 4
# de.prob: 0.4
# nBatches: 1
# Simulating datasets using Splat
# Getting parameters...
# Creating simulation object...
# Simulating library sizes...
# Simulating gene means...
# Simulating group DE...
# Simulating cell means...
# Simulating BCV...
# Simulating counts...
# Simulating dropout (if needed)...
# Sparsifying assays...
# Automatically converting to sparse matrices, threshold = 0.95
# Skipping 'BatchCellMeans': estimated sparse size 1.5 * dense matrix
# Skipping 'BaseCellMeans': estimated sparse size 1.5 * dense matrix
# Skipping 'BCV': estimated sparse size 1.5 * dense matrix
# Skipping 'CellMeans': estimated sparse size 1.49 * dense matrix
# Skipping 'TrueCounts': estimated sparse size 1.6 * dense matrix
# Skipping 'counts': estimated sparse size 1.6 * dense matrix
# Done!
Check group labels of cells
table(simulated_data$simulate_result$col_meta$group)
#
# Group1 Group2 Group3 Group4
# 201 206 290 303
Check the proportion of DEGs
row_meta <- simulated_data$simulate_result$row_meta
table(row_meta$de_gene == "yes")/length(row_meta$de_gene)
#
# FALSE TRUE
# 0.6568 0.3432
Note that we can know the DEGs between any pair of two groups in Splat method (except for scDesign and SPARSim). For example, if we want to
get the DEGs between the group1 and group2, we should extract the DEFactor
in gene metadata:
gene_meta <- simulated_data$simulate_result$row_meta
DEFactor1 <- gene_meta$DEFacGroup1
DEFactor2 <- gene_meta$DEFacGroup2
Then we do the division:
DEFactor <- DEFactor1/DEFactor2
Check the gene that whose DEFactor is not equal to 1 and they are defined as the DEGs between group1 and group2:
table(DEFactor != 1)
#
# FALSE TRUE
# 4034 966
DEGs_group1_group2 <- rownames(gene_meta)[DEFactor != 1]
DEGs_group1_group2[1:10]
# [1] "Gene1" "Gene4" "Gene7" "Gene11" "Gene12" "Gene15" "Gene16" "Gene17"
# [9] "Gene36" "Gene45"
scDesign and SPARSim can not return the DEGs between any pair of groups when the number of cell groups is higher than 2. But when there are only two groups in a simulated data, the DEGs are valid.
Simulating different cell batches is also an important application situation in many researches related to benchmarking and method development.
In Splat and many other methods, users can specify the number of cell batches and the cell numbers in every batch via batchCells
parameter. Here, we will simulate 3 batches with cell numbers of 1000, 2000 and 3000, respectively.
simulated_data <- simmethods::Splat_simulation(
parameters = estimation_result$estimate_result,
other_prior = list(batchCells = c(1000, 2000, 3000),
nGenes = 5000),
return_format = "list",
verbose = TRUE,
seed = 666
)
# nCells: 6000
# nGenes: 5000
# nGroups: 1
# de.prob: 0.1
# nBatches: 3
# Simulating datasets using Splat
# Getting parameters...
# Creating simulation object...
# Simulating library sizes...
# Simulating gene means...
# Simulating batch effects...
# Simulating BCV...
# Simulating counts...
# Simulating dropout (if needed)...
# Sparsifying assays...
# Automatically converting to sparse matrices, threshold = 0.95
# Skipping 'BatchCellMeans': estimated sparse size 1.5 * dense matrix
# Skipping 'BaseCellMeans': estimated sparse size 1.5 * dense matrix
# Skipping 'BCV': estimated sparse size 1.5 * dense matrix
# Skipping 'CellMeans': estimated sparse size 1.49 * dense matrix
# Skipping 'TrueCounts': estimated sparse size 1.59 * dense matrix
# Skipping 'counts': estimated sparse size 1.59 * dense matrix
# Done!
Check the batches:
table(simulated_data$simulate_result$col_meta$batch)
#
# Batch1 Batch2 Batch3
# 1000 2000 3000
Using Splat method to simulate the data with cellular differentiation trajectory is another application situation of data simulation. Simply, we can set paths
parameter as TRUE
.
simulated_data <- simmethods::Splat_simulation(
parameters = estimation_result$estimate_result,
other_prior = list(batchCells = 1000,
prob.group = c(0.3, 0.2, 0.5),
nGenes = 5000,
paths = TRUE),
return_format = "SingleCellExperiment",
verbose = TRUE,
seed = 666
)
# nCells: 1000
# nGenes: 5000
# nGroups: 3
# de.prob: 0.1
# nBatches: 1
# Simulating datasets using Splat
# Simulating trajectory datasets by Splat
# Getting parameters...
# Creating simulation object...
# Simulating library sizes...
# Simulating gene means...
# Simulating path endpoints...
# Simulating path steps...
# Simulating BCV...
# Simulating counts...
# Simulating dropout (if needed)...
# Sparsifying assays...
# Automatically converting to sparse matrices, threshold = 0.95
# Skipping 'BatchCellMeans': estimated sparse size 1.5 * dense matrix
# Skipping 'BaseCellMeans': estimated sparse size 1.5 * dense matrix
# Skipping 'BCV': estimated sparse size 1.5 * dense matrix
# Skipping 'CellMeans': estimated sparse size 1.49 * dense matrix
# Skipping 'TrueCounts': estimated sparse size 1.6 * dense matrix
# Skipping 'counts': estimated sparse size 1.6 * dense matrix
# Done!
library(scater)
# Loading required package: scuttle
# Loading required package: ggplot2
sim.paths <- logNormCounts(simulated_data$simulate_result)
sim.paths <- runPCA(sim.paths)
plotPCA(sim.paths, colour_by = "group")
If you want to set other parameters related to the trajectory in Splat method, you can browse the official vignettes represented in Splatter
package and the website.
help(splatSimulate, package = "splatter")
Here, we only add extra two parameters path.nSteps
and path.skew
:
simulated_data <- simmethods::Splat_simulation(
parameters = estimation_result$estimate_result,
other_prior = list(batchCells = 1000,
prob.group = c(0.3, 0.2, 0.5),
nGenes = 5000,
paths = TRUE,
path.nSteps = 20,
path.skew = 0.1),
return_format = "SingleCellExperiment",
verbose = TRUE,
seed = 666
)
# nCells: 1000
# nGenes: 5000
# nGroups: 3
# de.prob: 0.1
# nBatches: 1
# Simulating datasets using Splat
# Simulating trajectory datasets by Splat
# Getting parameters...
# Creating simulation object...
# Simulating library sizes...
# Simulating gene means...
# Simulating path endpoints...
# Simulating path steps...
# Simulating BCV...
# Simulating counts...
# Simulating dropout (if needed)...
# Sparsifying assays...
# Automatically converting to sparse matrices, threshold = 0.95
# Skipping 'BatchCellMeans': estimated sparse size 1.5 * dense matrix
# Skipping 'BaseCellMeans': estimated sparse size 1.5 * dense matrix
# Skipping 'BCV': estimated sparse size 1.5 * dense matrix
# Skipping 'CellMeans': estimated sparse size 1.49 * dense matrix
# Skipping 'TrueCounts': estimated sparse size 1.6 * dense matrix
# Skipping 'counts': estimated sparse size 1.6 * dense matrix
# Done!
library(scater)
sim.paths <- logNormCounts(simulated_data$simulate_result)
sim.paths <- runPCA(sim.paths)
plotPCA(sim.paths, colour_by = "group")
This part we will demonstrate how to simulate datasets by using Docker in R and users should make sure that Docker has been installed on your device.
First, start Docker and check:
library(simpipe2docker)
test_docker_installation(detailed = TRUE)
# ✔ Docker is installed
# ✔ Docker daemon is running
# ✔ Docker is at correct version (>1.0): 1.41
# ✔ Docker is in linux mode
# ✔ Docker can pull images
# ✔ Docker can run image
# ✔ Docker can mount temporary volumes
# ✔ Docker test successful -----------------------------------------------------------------
# [1] TRUE
Estimation parameters from Docker:
estimation_result <- simpipe2docker::estimate_parameters_container(
ref_data = ref_data,
method = "Splat",
verbose = TRUE,
seed = 666
)
# Learning parameters from data 1
# Running /usr/local/bin/docker run --name \
# 20230807_112948__container__uxBxg1JNLM -e 'TMPDIR=/tmp2' --workdir \
# /home/admin/ -v \
# '/var/folders/1l/xmc98tgx0m37wxtbtwnl6h7c0000gn/T//RtmpMrBHAW:/home/admin/docker_path' \
# -v \
# '/tmp/folders/1l/xmc98tgx0m37wxtbtwnl6h7c0000gn/T//RtmpMrBHAW/file8d9326fd785/tmp:/tmp2' \
# duohongrui/simpipe
# WARNING: The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested
# Estimating parameters using Splat
# Output is saved to /var/folders/1l/xmc98tgx0m37wxtbtwnl6h7c0000gn/T//RtmpMrBHAW
# Attempting to read output into R
Simulate new datasets from Docker:
## simulate 1000 cells and 1000 genes
simulated_data <- simpipe2docker::simulate_datasets_container(
parameters = estimation_result,
other_prior = list(batchCells = 1000,
nGenes = 1000),
return_format = "SingleCellExperiment",
verbose = TRUE,
seed = 666
)
# Simulating dataset 1
# Running /usr/local/bin/docker run --name \
# 20230807_113135__container__NigapuTAlX -e 'TMPDIR=/tmp2' --workdir \
# /home/admin/ -v \
# '/var/folders/1l/xmc98tgx0m37wxtbtwnl6h7c0000gn/T//RtmpMrBHAW:/home/admin/docker_path' \
# -v \
# '/tmp/folders/1l/xmc98tgx0m37wxtbtwnl6h7c0000gn/T//RtmpMrBHAW/file8d913c1e7cb/tmp:/tmp2' \
# duohongrui/simpipe
# WARNING: The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested
# Registered S3 method overwritten by 'SeuratDisk':
# method from
# as.sparse.H5Group Seurat
# nCells: 1000
# nGenes: 1000
# nGroups: 1
# de.prob: 0.1
# nBatches: 1
# Simulating datasets using Splat
# Getting parameters...
# Creating simulation object...
# Simulating library sizes...
# Simulating gene means...
# Simulating BCV...
# Simulating counts...
# Simulating dropout (if needed)...
# Sparsifying assays...
# Automatically converting to sparse matrices, threshold = 0.95
# Skipping 'BatchCellMeans': estimated sparse size 1.5 * dense matrix
# Skipping 'BaseCellMeans': estimated sparse size 1.5 * dense matrix
# Skipping 'BCV': estimated sparse size 1.5 * dense matrix
# Skipping 'CellMeans': estimated sparse size 1.5 * dense matrix
# Skipping 'TrueCounts': estimated sparse size 2.21 * dense matrix
# Skipping 'counts': estimated sparse size 2.21 * dense matrix
# Done!
# Output is saved to /var/folders/1l/xmc98tgx0m37wxtbtwnl6h7c0000gn/T//RtmpMrBHAW
# Attempting to read output into R
simulated_data$refdata_Splat_1$simulate_result
# class: SingleCellExperiment
# dim: 1000 1000
# metadata(1): Params
# assays(6): BatchCellMeans BaseCellMeans ... TrueCounts counts
# rownames(1000): Gene1 Gene2 ... Gene999 Gene1000
# rowData names(4): Gene BaseGeneMean OutlierFactor GeneMean
# colnames(1000): Cell1 Cell2 ... Cell999 Cell1000
# colData names(3): Cell Batch ExpLibSize
# reducedDimNames(0):
# mainExpName: NULL
# altExpNames(0):
## simulate 1000 cells and 1000 genes (two groups and 40% DEGs)
simulated_data <- simpipe2docker::simulate_datasets_container(
parameters = estimation_result,
other_prior = list(batchCells = 1000,
nGenes = 1000,
prob.group = c(0.4, 0.6),
de.prob = 0.4),
return_format = "list",
verbose = TRUE,
seed = 666
)
# Simulating dataset 1
# Running /usr/local/bin/docker run --name \
# 20230807_113237__container__4YsDYtfOI7 -e 'TMPDIR=/tmp2' --workdir \
# /home/admin/ -v \
# '/var/folders/1l/xmc98tgx0m37wxtbtwnl6h7c0000gn/T//RtmpMrBHAW:/home/admin/docker_path' \
# -v \
# '/tmp/folders/1l/xmc98tgx0m37wxtbtwnl6h7c0000gn/T//RtmpMrBHAW/file8d938ac09a/tmp:/tmp2' \
# duohongrui/simpipe
# WARNING: The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested
# Registered S3 method overwritten by 'SeuratDisk':
# method from
# as.sparse.H5Group Seurat
# nCells: 1000
# nGenes: 1000
# nGroups: 1
# de.prob: 0.4
# nBatches: 1
# Simulating datasets using Splat
# Getting parameters...
# Creating simulation object...
# Simulating library sizes...
# Simulating gene means...
# Simulating BCV...
# Simulating counts...
# Simulating dropout (if needed)...
# Sparsifying assays...
# Automatically converting to sparse matrices, threshold = 0.95
# Skipping 'BatchCellMeans': estimated sparse size 1.5 * dense matrix
# Skipping 'BaseCellMeans': estimated sparse size 1.5 * dense matrix
# Skipping 'BCV': estimated sparse size 1.5 * dense matrix
# Skipping 'CellMeans': estimated sparse size 1.5 * dense matrix
# Skipping 'TrueCounts': estimated sparse size 2.21 * dense matrix
# Skipping 'counts': estimated sparse size 2.21 * dense matrix
# Done!
# Output is saved to /var/folders/1l/xmc98tgx0m37wxtbtwnl6h7c0000gn/T//RtmpMrBHAW
# Attempting to read output into R
Based on simmethods package, simpipe package provides the other useful functions. Users can estimate parameters from multiple real datasets by using multiple methods. Meanwhile, users can also simulate multiple new datasets at once. In this part, we introduce some helpful functions in simpipe package.
First, we should use simpipe to estimate parameters from two real datasets:
## prepare a list of data
data <- list(data1 = ref_data,
data2 = ref_data)
estimation_result <- simpipe::estimate_parameters(
ref_data = data,
method = "Splat",
verbose = TRUE,
seed = 666
)
# Estimating parameters using Splat
# Estimating parameters using Splat
For every estimation result, we can generate multiple datasets by setting n
parameter in simulate_datasets
function:
simulated_data <- simpipe::simulate_datasets(
parameters = estimation_result,
other_prior = list(batchCells = 1000,
nGenes = 1000),
n = 3,
return_format = "list",
verbose = TRUE,
seed = 666
)
# The length of seeds is not identical to the time(s) that every method will be executed
# The seed will be set as: 100 200 300 when performing every method
# Simulating dataset 1
# nCells: 1000
# nGenes: 1000
# nGroups: 1
# de.prob: 0.1
# nBatches: 1
# Simulating datasets using Splat
# Getting parameters...
# Creating simulation object...
# Simulating library sizes...
# Simulating gene means...
# Simulating BCV...
# Simulating counts...
# Simulating dropout (if needed)...
# Sparsifying assays...
# Automatically converting to sparse matrices, threshold = 0.95
# Skipping 'BatchCellMeans': estimated sparse size 1.5 * dense matrix
# Skipping 'BaseCellMeans': estimated sparse size 1.5 * dense matrix
# Skipping 'BCV': estimated sparse size 1.5 * dense matrix
# Skipping 'CellMeans': estimated sparse size 1.5 * dense matrix
# Skipping 'TrueCounts': estimated sparse size 2.23 * dense matrix
# Skipping 'counts': estimated sparse size 2.23 * dense matrix
# Done!
# Simulating dataset 2
# nCells: 1000
# nGenes: 1000
# nGroups: 1
# de.prob: 0.1
# nBatches: 1
# Simulating datasets using Splat
# Getting parameters...
# Creating simulation object...
# Simulating library sizes...
# Simulating gene means...
# Simulating BCV...
# Simulating counts...
# Simulating dropout (if needed)...
# Sparsifying assays...
# Automatically converting to sparse matrices, threshold = 0.95
# Skipping 'BatchCellMeans': estimated sparse size 1.5 * dense matrix
# Skipping 'BaseCellMeans': estimated sparse size 1.5 * dense matrix
# Skipping 'BCV': estimated sparse size 1.5 * dense matrix
# Skipping 'CellMeans': estimated sparse size 1.5 * dense matrix
# Skipping 'TrueCounts': estimated sparse size 2.25 * dense matrix
# Skipping 'counts': estimated sparse size 2.25 * dense matrix
# Done!
# Simulating dataset 3
# nCells: 1000
# nGenes: 1000
# nGroups: 1
# de.prob: 0.1
# nBatches: 1
# Simulating datasets using Splat
# Getting parameters...
# Creating simulation object...
# Simulating library sizes...
# Simulating gene means...
# Simulating BCV...
# Simulating counts...
# Simulating dropout (if needed)...
# Sparsifying assays...
# Automatically converting to sparse matrices, threshold = 0.95
# Skipping 'BatchCellMeans': estimated sparse size 1.5 * dense matrix
# Skipping 'BaseCellMeans': estimated sparse size 1.5 * dense matrix
# Skipping 'BCV': estimated sparse size 1.5 * dense matrix
# Skipping 'CellMeans': estimated sparse size 1.5 * dense matrix
# Skipping 'TrueCounts': estimated sparse size 2.21 * dense matrix
# Skipping 'counts': estimated sparse size 2.21 * dense matrix
# Done!
# Simulating dataset 4
# nCells: 1000
# nGenes: 1000
# nGroups: 1
# de.prob: 0.1
# nBatches: 1
# Simulating datasets using Splat
# Getting parameters...
# Creating simulation object...
# Simulating library sizes...
# Simulating gene means...
# Simulating BCV...
# Simulating counts...
# Simulating dropout (if needed)...
# Sparsifying assays...
# Automatically converting to sparse matrices, threshold = 0.95
# Skipping 'BatchCellMeans': estimated sparse size 1.5 * dense matrix
# Skipping 'BaseCellMeans': estimated sparse size 1.5 * dense matrix
# Skipping 'BCV': estimated sparse size 1.5 * dense matrix
# Skipping 'CellMeans': estimated sparse size 1.5 * dense matrix
# Skipping 'TrueCounts': estimated sparse size 2.23 * dense matrix
# Skipping 'counts': estimated sparse size 2.23 * dense matrix
# Done!
# Simulating dataset 5
# nCells: 1000
# nGenes: 1000
# nGroups: 1
# de.prob: 0.1
# nBatches: 1
# Simulating datasets using Splat
# Getting parameters...
# Creating simulation object...
# Simulating library sizes...
# Simulating gene means...
# Simulating BCV...
# Simulating counts...
# Simulating dropout (if needed)...
# Sparsifying assays...
# Automatically converting to sparse matrices, threshold = 0.95
# Skipping 'BatchCellMeans': estimated sparse size 1.5 * dense matrix
# Skipping 'BaseCellMeans': estimated sparse size 1.5 * dense matrix
# Skipping 'BCV': estimated sparse size 1.5 * dense matrix
# Skipping 'CellMeans': estimated sparse size 1.5 * dense matrix
# Skipping 'TrueCounts': estimated sparse size 2.25 * dense matrix
# Skipping 'counts': estimated sparse size 2.25 * dense matrix
# Done!
# Simulating dataset 6
# nCells: 1000
# nGenes: 1000
# nGroups: 1
# de.prob: 0.1
# nBatches: 1
# Simulating datasets using Splat
# Getting parameters...
# Creating simulation object...
# Simulating library sizes...
# Simulating gene means...
# Simulating BCV...
# Simulating counts...
# Simulating dropout (if needed)...
# Sparsifying assays...
# Automatically converting to sparse matrices, threshold = 0.95
# Skipping 'BatchCellMeans': estimated sparse size 1.5 * dense matrix
# Skipping 'BaseCellMeans': estimated sparse size 1.5 * dense matrix
# Skipping 'BCV': estimated sparse size 1.5 * dense matrix
# Skipping 'CellMeans': estimated sparse size 1.5 * dense matrix
# Skipping 'TrueCounts': estimated sparse size 2.21 * dense matrix
# Skipping 'counts': estimated sparse size 2.21 * dense matrix
# Done!
We can also set seed
parameter whose length is equal to the number of n
:
simulated_data <- simpipe::simulate_datasets(
parameters = estimation_result,
other_prior = list(batchCells = 1000,
nGenes = 1000),
n = 3,
return_format = "list",
verbose = TRUE,
seed = c(666, 888, 999)
)
# Simulating dataset 1
# nCells: 1000
# nGenes: 1000
# nGroups: 1
# de.prob: 0.1
# nBatches: 1
# Simulating datasets using Splat
# Getting parameters...
# Creating simulation object...
# Simulating library sizes...
# Simulating gene means...
# Simulating BCV...
# Simulating counts...
# Simulating dropout (if needed)...
# Sparsifying assays...
# Automatically converting to sparse matrices, threshold = 0.95
# Skipping 'BatchCellMeans': estimated sparse size 1.5 * dense matrix
# Skipping 'BaseCellMeans': estimated sparse size 1.5 * dense matrix
# Skipping 'BCV': estimated sparse size 1.5 * dense matrix
# Skipping 'CellMeans': estimated sparse size 1.5 * dense matrix
# Skipping 'TrueCounts': estimated sparse size 2.21 * dense matrix
# Skipping 'counts': estimated sparse size 2.21 * dense matrix
# Done!
# Simulating dataset 2
# nCells: 1000
# nGenes: 1000
# nGroups: 1
# de.prob: 0.1
# nBatches: 1
# Simulating datasets using Splat
# Getting parameters...
# Creating simulation object...
# Simulating library sizes...
# Simulating gene means...
# Simulating BCV...
# Simulating counts...
# Simulating dropout (if needed)...
# Sparsifying assays...
# Automatically converting to sparse matrices, threshold = 0.95
# Skipping 'BatchCellMeans': estimated sparse size 1.5 * dense matrix
# Skipping 'BaseCellMeans': estimated sparse size 1.5 * dense matrix
# Skipping 'BCV': estimated sparse size 1.5 * dense matrix
# Skipping 'CellMeans': estimated sparse size 1.5 * dense matrix
# Skipping 'TrueCounts': estimated sparse size 2.21 * dense matrix
# Skipping 'counts': estimated sparse size 2.21 * dense matrix
# Done!
# Simulating dataset 3
# nCells: 1000
# nGenes: 1000
# nGroups: 1
# de.prob: 0.1
# nBatches: 1
# Simulating datasets using Splat
# Getting parameters...
# Creating simulation object...
# Simulating library sizes...
# Simulating gene means...
# Simulating BCV...
# Simulating counts...
# Simulating dropout (if needed)...
# Sparsifying assays...
# Automatically converting to sparse matrices, threshold = 0.95
# Skipping 'BatchCellMeans': estimated sparse size 1.5 * dense matrix
# Skipping 'BaseCellMeans': estimated sparse size 1.5 * dense matrix
# Skipping 'BCV': estimated sparse size 1.5 * dense matrix
# Skipping 'CellMeans': estimated sparse size 1.5 * dense matrix
# Skipping 'TrueCounts': estimated sparse size 2.17 * dense matrix
# Skipping 'counts': estimated sparse size 2.17 * dense matrix
# Done!
# Simulating dataset 4
# nCells: 1000
# nGenes: 1000
# nGroups: 1
# de.prob: 0.1
# nBatches: 1
# Simulating datasets using Splat
# Getting parameters...
# Creating simulation object...
# Simulating library sizes...
# Simulating gene means...
# Simulating BCV...
# Simulating counts...
# Simulating dropout (if needed)...
# Sparsifying assays...
# Automatically converting to sparse matrices, threshold = 0.95
# Skipping 'BatchCellMeans': estimated sparse size 1.5 * dense matrix
# Skipping 'BaseCellMeans': estimated sparse size 1.5 * dense matrix
# Skipping 'BCV': estimated sparse size 1.5 * dense matrix
# Skipping 'CellMeans': estimated sparse size 1.5 * dense matrix
# Skipping 'TrueCounts': estimated sparse size 2.21 * dense matrix
# Skipping 'counts': estimated sparse size 2.21 * dense matrix
# Done!
# Simulating dataset 5
# nCells: 1000
# nGenes: 1000
# nGroups: 1
# de.prob: 0.1
# nBatches: 1
# Simulating datasets using Splat
# Getting parameters...
# Creating simulation object...
# Simulating library sizes...
# Simulating gene means...
# Simulating BCV...
# Simulating counts...
# Simulating dropout (if needed)...
# Sparsifying assays...
# Automatically converting to sparse matrices, threshold = 0.95
# Skipping 'BatchCellMeans': estimated sparse size 1.5 * dense matrix
# Skipping 'BaseCellMeans': estimated sparse size 1.5 * dense matrix
# Skipping 'BCV': estimated sparse size 1.5 * dense matrix
# Skipping 'CellMeans': estimated sparse size 1.5 * dense matrix
# Skipping 'TrueCounts': estimated sparse size 2.21 * dense matrix
# Skipping 'counts': estimated sparse size 2.21 * dense matrix
# Done!
# Simulating dataset 6
# nCells: 1000
# nGenes: 1000
# nGroups: 1
# de.prob: 0.1
# nBatches: 1
# Simulating datasets using Splat
# Getting parameters...
# Creating simulation object...
# Simulating library sizes...
# Simulating gene means...
# Simulating BCV...
# Simulating counts...
# Simulating dropout (if needed)...
# Sparsifying assays...
# Automatically converting to sparse matrices, threshold = 0.95
# Skipping 'BatchCellMeans': estimated sparse size 1.5 * dense matrix
# Skipping 'BaseCellMeans': estimated sparse size 1.5 * dense matrix
# Skipping 'BCV': estimated sparse size 1.5 * dense matrix
# Skipping 'CellMeans': estimated sparse size 1.5 * dense matrix
# Skipping 'TrueCounts': estimated sparse size 2.17 * dense matrix
# Skipping 'counts': estimated sparse size 2.17 * dense matrix
# Done!