-
Notifications
You must be signed in to change notification settings - Fork 3.7k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[Concept Entry] Sklearn multilabel-classification (#5817)
* New file has been added. * Update user-input.md * Update user-input.md * File has been modified. * Update multilabel-classification.md fixes * Update multilabel-classification.md ---------
- Loading branch information
1 parent
915e13d
commit a198871
Showing
1 changed file
with
121 additions
and
0 deletions.
There are no files selected for viewing
121 changes: 121 additions & 0 deletions
121
content/sklearn/concepts/multilabel-classification/multilabel-classification.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,121 @@ | ||
--- | ||
Title: 'Multilabel Classification' | ||
Description: 'Multilabel classification is a machine learning task where each instance can be assigned multiple labels or categories simultaneously.' | ||
Subjects: | ||
- 'Computer Science' | ||
- 'Data Science' | ||
- 'Data Visualization' | ||
- 'Machine Learning' | ||
Tags: | ||
- 'AI' | ||
- 'Classification' | ||
- 'Natural Language Processing' | ||
- 'Scikit-learn' | ||
CatalogContent: | ||
- 'learn-python-3' | ||
- 'paths/intermediate-machine-learning-skill-path' | ||
--- | ||
|
||
In sklearn, **Multilabel Classification** assigns multiple labels to a single instance, allowing models to predict multiple outputs simultaneously. This method differs from traditional classification, where each instance belongs to only one class. | ||
|
||
Scikit-learn offers tools like `OneVsRestClassifier`, `ClassifierChain`, and `MultiOutputClassifier` to handle multilabel classification and enable efficient model training and evaluation. | ||
|
||
## Syntax | ||
|
||
Here's the syntax for using multiabel classification in sklearn: | ||
|
||
```pseudo | ||
from sklearn.multioutput import MultiOutputClassifier | ||
from sklearn.ensemble import RandomForestClassifier | ||
from sklearn.model_selection import train_test_split | ||
# Step 1: Initialize the base classifier | ||
base_model = RandomForestClassifier(random_state=42) | ||
# Step 2: Create a MultiOutputClassifier wrapper for multilabel classification | ||
multi_label_model = MultiOutputClassifier(base_model) | ||
# Step 3: Train the model using the training dataset | ||
multi_label_model.fit(X_train, y_train) | ||
# Step 4: Make predictions on the test dataset | ||
predicted_labels = multi_label_model.predict(X_test) | ||
# Step 5: Evaluate predictions or use the results | ||
print(predicted_labels) | ||
``` | ||
|
||
- `RandomForestClassifier`: The base classifier for multilabel classification. | ||
- `MultiOutputClassifier`: A wrapper to extend the base classifier for multilabel tasks. | ||
- `Training and testing`: The model is trained with `fit()` and predictions are made using `predict()`. | ||
|
||
## Example | ||
|
||
This code demonstrates multilabel classification using scikit-learn by training a model to assign multiple labels: | ||
|
||
```py | ||
from sklearn.datasets import make_multilabel_classification | ||
from sklearn.ensemble import RandomForestClassifier | ||
from sklearn.multioutput import MultiOutputClassifier | ||
from sklearn.metrics import classification_report | ||
|
||
# Generate synthetic multilabel data | ||
X, y = make_multilabel_classification(n_samples=100, n_features=10, n_classes=3, n_labels=2, random_state=42) | ||
|
||
# Initialize a base classifier | ||
base_classifier = RandomForestClassifier() | ||
|
||
# Wrap the base classifier for multilabel classification | ||
model = MultiOutputClassifier(base_classifier) | ||
|
||
# Train the model | ||
model.fit(X, y) | ||
|
||
# Predict labels for new data | ||
predictions = model.predict(X[:5]) | ||
|
||
# Display predictions | ||
print("Predicted Labels for First 5 Samples:") | ||
print(predictions) | ||
``` | ||
|
||
The code results the following output: | ||
|
||
```shell | ||
Predicted Labels for First 5 Samples: | ||
[[1 1 0] | ||
[1 1 0] | ||
[0 0 1] | ||
[1 1 1] | ||
[0 1 0]] | ||
``` | ||
|
||
## Codebyte Example | ||
|
||
The following codebyte example trains a Random Forest classifier for multilabel classification on dataset and predicts multiple categories for new samples: | ||
|
||
```codebyte/python | ||
# This code demonstrates multilabel classification using scikit-learn. | ||
from sklearn.datasets import make_multilabel_classification | ||
from sklearn.ensemble import RandomForestClassifier | ||
from sklearn.multioutput import MultiOutputClassifier | ||
# Generate synthetic multilabel data | ||
X, y = make_multilabel_classification(n_samples=100, n_features=10, n_classes=3, n_labels=2, random_state=42) | ||
# Initialize a Random Forest classifier | ||
classifier = RandomForestClassifier() | ||
# Wrap the classifier for multilabel classification | ||
multi_label_model = MultiOutputClassifier(classifier) | ||
# Train the model on the dataset | ||
multi_label_model.fit(X, y) | ||
# Predict labels for the first 4 samples | ||
predictions = multi_label_model.predict(X[:4]) | ||
# Display the predictions | ||
print("Predicted labels for the first 4 samples:") | ||
print(predictions) | ||
``` |