Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Concept Entry] Sklearn: Linear Discriminant Analysis #5824

Merged
merged 4 commits into from
Dec 21, 2024
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,100 @@
---
Title: 'Linear Discriminant Analysis'
Description: 'Linear Discriminant Analysis aims to project data onto a lower-dimensional space while preserving the information that discriminates between different classes.'
Subjects:
- 'Data Science'
- 'Machine Learning'
Tags:
- 'Machine Learning'
- 'Scikit-learn'
- 'Supervised Learning'
- 'Unsupervised Learning'
CatalogContent:
- 'learn-python-3'
- 'paths/computer-science'
---

In Sklearn, **Linear Discriminant Analysis (LDA)** is a supervised algorithm that aims to project data onto a lower-dimensional space while preserving the information that discriminates between different classes. LDA finds a set of directions in the original feature space that maximize the separation between the classes. These directions are called discriminant directions. By projecting the data onto these directions, LDA reduces the dimensionality of the data while retaining the information that is most relevant for classification.

## Syntax

```pseudo
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis

# Create an LDA model
model = LinearDiscriminantAnalysis()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add parameters for the LinearDiscriminantAnalysis to increase its readability?
Reference - https://scikit-learn.org/stable/modules/generated/sklearn.discriminant_analysis.LinearDiscriminantAnalysis.html

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, we can certainly do that.


# Fit the model to the training data
model.fit(X_train, y_train)

# Make predictions on the new data
y_pred = model.predict(X_test)
```

## Example

The following example demonstrates the implementation of LDA:

```py
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Load the Iris dataset
iris = load_iris()
X = iris.data
y = iris.target

# Create training and testing sets by splitting the dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Create an LDA model
model = LinearDiscriminantAnalysis()

# Fit the model to the training data
model.fit(X_train, y_train)

# Make predictions on the new data
y_pred = model.predict(X_test)

# Evaluate the model
print("Accuracy:", accuracy_score(y_test, y_pred))
```

The above code produces the following output:

```shell
Accuracy: 1.0
```

## Codebyte Example

The following codebyte example demonstrates the implementation of LDA:

```codebyte/python
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
from sklearn.datasets import load_diabetes
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Load the Diabetes dataset
diabetes = load_diabetes()
X = diabetes.data
y = diabetes.target

# Create training and testing sets by splitting the dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4, random_state=44)

# Create an LDA model
model = LinearDiscriminantAnalysis()

# Fit the model to the training data
model.fit(X_train, y_train)

# Make predictions on the test set
y_pred = model.predict(X_test)

# Evaluate the model
print("Accuracy:", accuracy_score(y_test, y_pred))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we should use Iris data set here, because the Diabetes dataset from sklearn.datasets is a regression dataset, not a classification dataset. Linear Discriminant Analysis (LDA) is designed for classification tasks where the target variable (y) has discrete class labels, not continuous values as in the case of the Diabetes dataset.

```
Loading