muscat

Here muscat method will be demonstrated clearly and hope that this document can help you.

Estimating parameters from a real dataset

Before simulating datasets, it is important to estimate some essential parameters from a real dataset in order to make the simulated data more real.

library(simmethods)
library(SingleCellExperiment)
# Load data
ref_data <- simmethods::data

Default eatimation

estimate_result <- simmethods::muscat_estimation(
  ref_data = ref_data,
  other_prior = NULL,
  verbose = T,
  seed = 10
)
# Estimating parameters using muscat
# Filtering...
# - 4000/4000 genes and 160/160 cells retained.
# Estimating gene and cell parameters...

Estimation with cell groups

When you use muscat to estimate parameters from a real dataset, you can also input a numeric vector to specify the groups or plates that each cell comes from, like other_prior = list(group.condition = the numeric vector).

group_condition <- as.numeric(simmethods::group_condition)
estimate_result <- simmethods::muscat_estimation(
  ref_data = ref_data,
  other_prior = list(group.condition = group_condition),
  verbose = T,
  seed = 10
)
# Estimating parameters using muscat
# Filtering...
# - 4000/4000 genes and 160/160 cells retained.
# Estimating gene and cell parameters...

Simulating datasets using muscat

After estimating parameter from a real dataset, we will simulate a dataset based on the learned parameters with different scenarios.

  1. Datasets with default parameters
  2. Determin the number of cells and genes
  3. Simulate two or more groups

Datasets with default parameters

The reference data contains 160 cells and 4000 genes, if we simulate datasets with default parameters and then we will obtain a new data which has the same size as the reference data. In addtion, the simulated dataset will have one group of cells.

simulate_result <- simmethods::muscat_simulation(
  parameters = estimate_result[["estimate_result"]],
  other_prior = NULL,
  return_format = "SCE",
  seed = 111
)
# nCells: 160
# nGenes: 4000
# nGroups: 1
# de.group: 0.1
# fc.group: 2
SCE_result <- simulate_result[["simulate_result"]]
dim(SCE_result)
# [1] 4000  160

Determin the number of cells and genes

In muscat, we can set nCells and nGenes to specify the number of cells and genes.

Here, we simulate a new dataset with 1000 cells and 1000 genes:

simulate_result <- simmethods::muscat_simulation(
  parameters = estimate_result[["estimate_result"]],
  return_format = "list",
  other_prior = list(nCells = 1000,
                     nGenes = 1000),
  seed = 111
)
# nCells: 1000
# nGenes: 1000
# nGroups: 1
# de.group: 0.1
# fc.group: 2
result <- simulate_result[["simulate_result"]][["count_data"]]
dim(result)
# [1] 1000 1000

Simulate two or more groups

In muscat, we can set nGroups directly to specify the number of simulated groups. muscat also provides other parameters related to DEGs such as the proportion of DEGs (de.prob) and the fold change of DGEs (fc.group).

For demonstration, we will simulate two groups using the learned parameters.

simulate_result <- simmethods::muscat_simulation(
  parameters = estimate_result[["estimate_result"]],
  return_format = "list",
  other_prior = list(nCells = 500,
                     nGenes = 1000,
                     nGroups = 2,
                     de.prob = 0.4,
                     fc.group = 4),
  seed = 111
)
# nCells: 500
# nGenes: 1000
# nGroups: 2
# de.group: 0.4
# fc.group: 4
result <- simulate_result[["simulate_result"]][["count_data"]]
dim(result)
# [1] 1000  500
## cell information
cell_info <- simulate_result[["simulate_result"]][["col_meta"]]
table(cell_info$group)
# 
# Group1 Group2 
#    240    260
## gene information
gene_info <- simulate_result[["simulate_result"]][["row_meta"]]
### the proportion of DEGs
table(gene_info$de_gene)[2]/nrow(result) ## de.prob = 0.4
#   yes 
# 0.384