Skip to content

Unsupervised ML: Dimensionality Reduction

Meghavarshini Krishnaswamy edited this page Feb 28, 2024 · 2 revisions

Unsupervised Machine Learning

Sciki-Learn: Dimensionality Reduction


Unsupervised machine learning dimensionality reduction is a technique that reduces the number of dimensions in a dataset without the need for labels, and can be useful for visualization, data compression, and machine learning.


Some of the most popular dimensionality reduction algorithms included in scikit-learn are:

  • Principal component analysis (PCA) is a linear dimensionality reduction algorithm that projects data points onto a lower-dimensional subspace that preserves as much of the variance of the data as possible.

  • Linear and Quadratic discriminant analysis (LDA) is a supervised dimensionality reduction algorithm that projects data points onto a lower-dimensional subspace that separates different classes of data as well as possible.

  • Kernel PCA (KPCA) is a nonlinear dimensionality reduction algorithm that projects data points onto a lower-dimensional subspace using a kernel function. Sparse principal component analysis (SPCA) is a dimensionality reduction algorithm that preserves the sparsity of the data while reducing its dimensionality.

  • Nonnegative matrix factorization (NMF) is a dimensionality reduction algorithm that decomposes a data matrix into a nonnegative basis matrix and a coefficient matrix.

The choice of which algorithm to use depends on the specific application. For example, PCA is a good choice for applications where the data is linearly correlated, while LDA is a good choice for applications where the data is not linearly correlated.


To use scikit-learn for dimensionality reduction, import libraries, load data, choose an algorithm, fit it to the data, transform the data, and evaluate the results.

For example:

import numpy as np
from sklearn.decomposition import PCA

# Load the data
data = np.loadtxt('data.csv', delimiter=',')

# Choose a dimensionality reduction algorithm
pca = PCA(n_components=2)

# Fit the algorithm to the data
pca.fit(data)

# Transform the data
reduced_data = pca.transform(data)

# Evaluate the results
print('The explained variance ratio is:', pca.explained_variance_ratio_)

References


Please see Jupyter Notebook Example


Created: 04/23/2023 (C. Lizárraga); Last update: 01/19/2024 (C. Lizárraga)

CC BY-NC-SA 4.0