-
Notifications
You must be signed in to change notification settings - Fork 0
Unsupervised ML: Dimensionality Reduction
Unsupervised machine learning dimensionality reduction is a technique that reduces the number of dimensions in a dataset without the need for labels, and can be useful for visualization, data compression, and machine learning.
Some of the most popular dimensionality reduction algorithms included in scikit-learn are:
-
Principal component analysis (PCA) is a linear dimensionality reduction algorithm that projects data points onto a lower-dimensional subspace that preserves as much of the variance of the data as possible.
-
Linear and Quadratic discriminant analysis (LDA) is a supervised dimensionality reduction algorithm that projects data points onto a lower-dimensional subspace that separates different classes of data as well as possible.
-
Kernel PCA (KPCA) is a nonlinear dimensionality reduction algorithm that projects data points onto a lower-dimensional subspace using a kernel function. Sparse principal component analysis (SPCA) is a dimensionality reduction algorithm that preserves the sparsity of the data while reducing its dimensionality.
-
Nonnegative matrix factorization (NMF) is a dimensionality reduction algorithm that decomposes a data matrix into a nonnegative basis matrix and a coefficient matrix.
The choice of which algorithm to use depends on the specific application. For example, PCA is a good choice for applications where the data is linearly correlated, while LDA is a good choice for applications where the data is not linearly correlated.
To use scikit-learn for dimensionality reduction, import libraries, load data, choose an algorithm, fit it to the data, transform the data, and evaluate the results.
For example:
import numpy as np
from sklearn.decomposition import PCA
# Load the data
data = np.loadtxt('data.csv', delimiter=',')
# Choose a dimensionality reduction algorithm
pca = PCA(n_components=2)
# Fit the algorithm to the data
pca.fit(data)
# Transform the data
reduced_data = pca.transform(data)
# Evaluate the results
print('The explained variance ratio is:', pca.explained_variance_ratio_)
- Scikit-Learn Dimensionality Reduction Documentation
- Introduction to Machine Learning with Scikit-Learn. Carpentries lesson.
- 6 Dimensionality Reduction Algorithms With Python. Jason Brownlee. Machine Learning Mastery.
Please see Jupyter Notebook Example
Created: 04/23/2023 (C. Lizárraga); Last update: 01/19/2024 (C. Lizárraga)
UArizona DataLab, Data Science Institute, 2024