Here SplatPop method will be demonstrated clearly and hope that this document can help you.
Before simulating datasets, it is important to estimate some essential parameters from a real dataset in order to make the simulated data more real. If you do not have a single-cell transcriptomics count matrix now, you can use the data collected in simmethods package by simmethods:data
command.
library(simmethods)
library(SingleCellExperiment)
# Load data
ref_data <- simmethods::data
dim(ref_data)
# [1] 4000 160
Using simmethods::SplatPop_estimation
command to execute the estimation step.
estimate_result <- simmethods::SplatPop_estimation(ref_data = ref_data,
verbose = T,
seed = 10)
# Estimating parameters using SplatPop
After estimating parameter from a real dataset, we will simulate a dataset based on the learned parameters with different scenarios.
The reference data contains 160 cells and 4000 genes, if we simulate datasets with default parameters and then we will obtain a new data which has the same size as the reference data. In addtion, the simulated dataset will have one group and one batch of cells.
simulate_result <- simmethods::SplatPop_simulation(
parameters = estimate_result[["estimate_result"]],
return_format = "SCE",
seed = 111
)
# nCells: 160
# nGenes: 4000
# nGroups: 1
# de.prob: 0.1
# nBatches: 1
SCE_result <- simulate_result[["simulate_result"]]
dim(SCE_result)
# [1] 4000 160
head(colData(SCE_result))
# DataFrame with 6 rows and 3 columns
# cell_name batch group
# <character> <character> <character>
# Cell1 Cell1 Batch1 Group1
# Cell2 Cell2 Batch1 Group1
# Cell3 Cell3 Batch1 Group1
# Cell4 Cell4 Batch1 Group1
# Cell5 Cell5 Batch1 Group1
# Cell6 Cell6 Batch1 Group1
head(rowData(SCE_result))
# DataFrame with 6 rows and 1 column
# gene_name
# <character>
# Gene1 Gene1
# Gene2 Gene2
# Gene3 Gene3
# Gene4 Gene4
# Gene5 Gene5
# Gene6 Gene6
In SplatPop, we can not set nCells
directly and should set batchCells
instead. For example, if we want to simulate 500 cells, we can type other_prior = list(batchCells = 500)
. For genes, we can just set nGenes
.
Here, we simulate a new dataset with 500 cells and 1000 genes:
simulate_result <- simmethods::SplatPop_simulation(
parameters = estimate_result[["estimate_result"]],
return_format = "list",
other_prior = list(batchCells = 500,
nGenes = 1000),
seed = 111
)
# nCells: 500
# nGenes: 1000
# nGroups: 1
# de.prob: 0.1
# nBatches: 1
result <- simulate_result[["simulate_result"]][["count_data"]]
dim(result)
# [1] 1000 500
In SplatPop, we can not set nGroups
directly and should set prob.group
instead. For example, if we want to simulate 2 groups, we can type other_prior = list(prob.group = c(0.5, 0.5))
. Note that the sum of prob.group
numeric vector must equal to 1, so we can also set prob.group = c(0.3, 0.7)
.
In addtion, if we want to simulate three or more groups, we should obey the rules:
prob.group
vector must always equal to the number of groups.prob.group
numeric vector must equal to 1.For demonstration, we will simulate three groups using the learned parameters.
simulate_result <- simmethods::SplatPop_simulation(
parameters = estimate_result[["estimate_result"]],
return_format = "list",
other_prior = list(batchCells = 500,
nGenes = 1000,
prob.group = c(0.1, 0.3, 0.6)),
seed = 111
)
# nCells: 500
# nGenes: 1000
# nGroups: 3
# de.prob: 0.1
# nBatches: 1
result <- simulate_result[["simulate_result"]][["count_data"]]
dim(result)
# [1] 1000 500
## cell information
cell_info <- simulate_result[["simulate_result"]][["col_meta"]]
table(cell_info$group)
#
# Group1 Group2 Group3
# 50 150 300
## gene information
gene_info <- simulate_result[["simulate_result"]][["row_meta"]]
### the proportion of DEGs
table(gene_info$de_gene)[2]/nrow(result) ## de.prob = 0.1
# yes
# 0.094
We can see that the proportion of differential expressed genes is 0.1 (equals to the default). Next, if we want to know the fold change between two groups, we will do division with the groups that we are interested in.
## fc between group2 and group1
fc_group1_to_group2 <- gene_info$DEFacGroup2/gene_info$DEFacGroup1
## fc between group3 and group1
fc_group1_to_group3 <- gene_info$DEFacGroup3/gene_info$DEFacGroup1
## fc between group3 and group2
fc_group2_to_group3 <- gene_info$DEFacGroup3/gene_info$DEFacGroup2
In SplatPop, we can not set nBatches
directly and should set batchCells
instead. For example, if we want to simulate 2 batches, we can type other_prior = list(batchCells = c(250, 250))
. Note that the sum of batchCells
numeric vector represents the total number of cells and the length of the vector equals to the number of batches.
In addtion, if we want to simulate three or more batches, we should obey the rules:
batchCells
vector always equals to the number of batches.batchCells
numeric vector represents the total number of cells.For demonstration, we will simulate three batches using the learned parameters.
simulate_result <- simmethods::SplatPop_simulation(
parameters = estimate_result[["estimate_result"]],
return_format = "list",
other_prior = list(batchCells = c(200, 300),
nGenes = 1000),
seed = 111
)
# nCells: 500
# nGenes: 1000
# nGroups: 1
# de.prob: 0.1
# nBatches: 2
result <- simulate_result[["simulate_result"]][["count_data"]]
dim(result)
# [1] 1000 500
## cell information
cell_info <- simulate_result[["simulate_result"]][["col_meta"]]
table(cell_info$batch)
#
# Batch1 Batch2
# 201 299
As mentioned before, we can set prob.group
and batchCells
to determine the number of groups and batches and we can also set de.prob
to specify the proportion of DEGs. Here, we simulate a dataset with following settings:
simulate_result <- simmethods::SplatPop_simulation(
parameters = estimate_result[["estimate_result"]],
return_format = "list",
other_prior = list(batchCells = c(500, 500),
nGenes = 2000,
de.prob = 0.2,
prob.group = c(0.2, 0.3, 0.5)),
seed = 111
)
# nCells: 1000
# nGenes: 2000
# nGroups: 3
# de.prob: 0.2
# nBatches: 2
result <- simulate_result[["simulate_result"]][["count_data"]]
dim(result)
# [1] 2000 1000
## cell information
cell_info <- simulate_result[["simulate_result"]][["col_meta"]]
table(cell_info$batch)
#
# Batch1 Batch2
# 524 476
table(cell_info$group)
#
# Group1 Group2 Group3
# 209 299 492
## gene information
gene_info <- simulate_result[["simulate_result"]][["row_meta"]]
### proportion of DEGs
table(gene_info$de_gene)[2]/nrow(result)
# yes
# 0.1725
### fc
fc_group2_to_group3 <- gene_info$DEFacGroup3/gene_info$DEFacGroup2