Classification analysis

Classification analysis models essential features of samples derived from different treatments or diseased states to effectively predict unknown samples, facilitating the identification and diagnosis of subtypes or stages of diseases.

Breaking: Some well-used and comprehensive classification tools are provided in this section.


7.1.1 caret

Introduction: The caret package (short for Classification and REgression Training) is a set of functions that attempt to streamline the process for creating predictive models. The package contains tools for:1) data splitting; 2) pre-processing; 3) feature selection; 4) model tuning using resampling; 5) variable importance estimation, as well as other functionality. There are many different modeling functions in R. Some have different syntax for model training and/or prediction. The package started off as a way to provide a uniform interface the functions themselves, as well as a way to standardize common tasks (such parameter tuning and variable importance).

Installation: The current release version can be found on CRAN (http://cran.r-project.org/web/packages/caret/) and the project is hosted on github (https://github.com/topepo/caret). To install this package, start R (version “4.2”) and enter:

install.packages(caret)

Application: The caret vignette can be found at https://topepo.github.io/caret/index.html.


7.1.2 Tidymodels

Introduction: The tidymodels framework is a collection of packages for modeling and machine learning using tidyverse principles. Tidymodels contains various machine learning models, some of which commonly include linear regression, logistic regression, decision trees, random forests, support vector machines, and others. The complete list and detailed explanations can be found in the official documentation of Tidymodels: Tidymodels Model List https://www.tidymodels.org/find/all/.

Installation: To install this package, start R (version “4.2”) and enter:

install.packages("tidymodels")

Application: The Tidymodels vignette can be found at https://www.tidymodels.org/packages/.


7.1.3 mlr3

Introduction: mlr was first released to CRAN in 2013. Its core design and architecture date back even further. The addition of many features has led to a feature creep which makes mlr hard to maintain and hard to extend. We also think that while mlr was nicely extensible in some parts (learners, measures, etc.), other parts were less easy to extend from the outside. Also, many helpful R libraries did not exist at the time mlr was created, and their inclusion would result in non-trivial API changes.

Installation: Install the last release from CRAN:

install.packages("mlr3")

Install the development version from GitHub:

remotes::install_github("mlr-org/mlr3")

If you want to get started with mlr3, we recommend installing the mlr3verse meta-package which installs mlr3 and some of the most important extension packages:

install.packages("mlr3verse")

Application: The mlr3 vignette can be found at https://github.com/mlr-org/mlr3.


7.1.4 MASS

Introduction: This package includes many useful functions and data examples, including functions for estimating linear models through generalized least squares (GLS), fitting negative binomial linear models, the robust fitting of linear models, and Kruskal’s non-metric multidimensional scaling.

Installation: Install the last release from CRAN:

install.packages("MASS")

Application: The MASS vignette can be found at https://cran.r-project.org/web/packages/MASS/MASS.pdf.