Here muscat method will be demonstrated clearly and hope that this document can help you.
Before simulating datasets, it is important to estimate some essential parameters from a real dataset in order to make the simulated data more real.
library(simmethods)
library(SingleCellExperiment)
# Load data
ref_data <- simmethods::data
estimate_result <- simmethods::muscat_estimation(
ref_data = ref_data,
other_prior = NULL,
verbose = T,
seed = 10
)
# Estimating parameters using muscat
# Filtering...
# - 4000/4000 genes and 160/160 cells retained.
# Estimating gene and cell parameters...
When you use muscat to estimate parameters from a real dataset, you can also input a numeric vector to specify the groups or plates that each cell comes from, like other_prior = list(group.condition = the numeric vector)
.
group_condition <- as.numeric(simmethods::group_condition)
estimate_result <- simmethods::muscat_estimation(
ref_data = ref_data,
other_prior = list(group.condition = group_condition),
verbose = T,
seed = 10
)
# Estimating parameters using muscat
# Filtering...
# - 4000/4000 genes and 160/160 cells retained.
# Estimating gene and cell parameters...
After estimating parameter from a real dataset, we will simulate a dataset based on the learned parameters with different scenarios.
The reference data contains 160 cells and 4000 genes, if we simulate datasets with default parameters and then we will obtain a new data which has the same size as the reference data. In addtion, the simulated dataset will have one group of cells.
simulate_result <- simmethods::muscat_simulation(
parameters = estimate_result[["estimate_result"]],
other_prior = NULL,
return_format = "SCE",
seed = 111
)
# nCells: 160
# nGenes: 4000
# nGroups: 1
# de.group: 0.1
# fc.group: 2
SCE_result <- simulate_result[["simulate_result"]]
dim(SCE_result)
# [1] 4000 160
In muscat, we can set nCells
and nGenes
to specify the number of cells and genes.
Here, we simulate a new dataset with 1000 cells and 1000 genes:
simulate_result <- simmethods::muscat_simulation(
parameters = estimate_result[["estimate_result"]],
return_format = "list",
other_prior = list(nCells = 1000,
nGenes = 1000),
seed = 111
)
# nCells: 1000
# nGenes: 1000
# nGroups: 1
# de.group: 0.1
# fc.group: 2
result <- simulate_result[["simulate_result"]][["count_data"]]
dim(result)
# [1] 1000 1000
In muscat, we can set nGroups
directly to specify the number of simulated groups. muscat also provides other parameters related to DEGs such as the proportion of DEGs (de.prob
) and the fold change of DGEs (fc.group
).
For demonstration, we will simulate two groups using the learned parameters.
simulate_result <- simmethods::muscat_simulation(
parameters = estimate_result[["estimate_result"]],
return_format = "list",
other_prior = list(nCells = 500,
nGenes = 1000,
nGroups = 2,
de.prob = 0.4,
fc.group = 4),
seed = 111
)
# nCells: 500
# nGenes: 1000
# nGroups: 2
# de.group: 0.4
# fc.group: 4
result <- simulate_result[["simulate_result"]][["count_data"]]
dim(result)
# [1] 1000 500
## cell information
cell_info <- simulate_result[["simulate_result"]][["col_meta"]]
table(cell_info$group)
#
# Group1 Group2
# 240 260
## gene information
gene_info <- simulate_result[["simulate_result"]][["row_meta"]]
### the proportion of DEGs
table(gene_info$de_gene)[2]/nrow(result) ## de.prob = 0.4
# yes
# 0.384