Unsupervised ML: Dimensionality Reduction

Unsupervised Machine Learning

Sciki-Learn: Dimensionality Reduction

Unsupervised machine learning dimensionality reduction is a technique that reduces the number of dimensions in a dataset without the need for labels, and can be useful for visualization, data compression, and machine learning.

Some of the most popular dimensionality reduction algorithms included in scikit-learn are:

Principal component analysis (PCA) is a linear dimensionality reduction algorithm that projects data points onto a lower-dimensional subspace that preserves as much of the variance of the data as possible.
Linear and Quadratic discriminant analysis (LDA) is a supervised dimensionality reduction algorithm that projects data points onto a lower-dimensional subspace that separates different classes of data as well as possible.
Kernel PCA (KPCA) is a nonlinear dimensionality reduction algorithm that projects data points onto a lower-dimensional subspace using a kernel function. Sparse principal component analysis (SPCA) is a dimensionality reduction algorithm that preserves the sparsity of the data while reducing its dimensionality.
Nonnegative matrix factorization (NMF) is a dimensionality reduction algorithm that decomposes a data matrix into a nonnegative basis matrix and a coefficient matrix.

The choice of which algorithm to use depends on the specific application. For example, PCA is a good choice for applications where the data is linearly correlated, while LDA is a good choice for applications where the data is not linearly correlated.

To use scikit-learn for dimensionality reduction, import libraries, load data, choose an algorithm, fit it to the data, transform the data, and evaluate the results.

For example:

import numpy as np
from sklearn.decomposition import PCA

# Load the data
data = np.loadtxt('data.csv', delimiter=',')

# Choose a dimensionality reduction algorithm
pca = PCA(n_components=2)

# Fit the algorithm to the data
pca.fit(data)

# Transform the data
reduced_data = pca.transform(data)

# Evaluate the results
print('The explained variance ratio is:', pca.explained_variance_ratio_)

References

Scikit-Learn Dimensionality Reduction Documentation
Introduction to Machine Learning with Scikit-Learn. Carpentries lesson.
6 Dimensionality Reduction Algorithms With Python. Jason Brownlee. Machine Learning Mastery.

Please see Jupyter Notebook Example

Created: 04/23/2023 (C. Lizárraga); Last update: 01/19/2024 (C. Lizárraga)

CC BY-NC-SA 4.0

UArizona DataLab, Data Science Institute, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unsupervised ML: Dimensionality Reduction

Unsupervised Machine Learning

Sciki-Learn: Dimensionality Reduction

References

Clone this wiki locally