-
Notifications
You must be signed in to change notification settings - Fork 0
Ensemble Methods: Bagging, Boosting, and Stacking.
(Image credit: Omar Flores, Unsplash.com)
Bagging, boosting, and stacking are ensemble learning techniques in machine learning that combine the predictions of multiple base models to improve the overall performance compared to a single model.
However, they achieve this goal through different approaches:
1️⃣ Bagging (Bootstrap Aggregating):
- Goal: Reduce the variance of the model.
-
Method:
- Creates multiple subsets of the training data with replacement (bootstrap sampling).
- Trains multiple models (usually the same type, called "weak learners") on each subset independently.
- Combines the predictions of all models using simple averaging (regression) or majority vote (classification).
There is a collection of model base algorithms that can be used for bagging:
- Decision Trees: Bagging is most commonly used with decision trees because they are prone to high variance. By training multiple decision trees on different data subsets, bagging reduces the overall variance and improves the model's robustness. This combination is a Random Forest, a powerful and popular ensemble learning method.
- K-Nearest Neighbors (KNN): Bagging can also be beneficial for KNN as it can help reduce the sensitivity to noisy data points and improve the overall performance.
- Support Vector Machines (SVM): While less common, bagging can also be applied to SVMs, particularly for multi-class classification problems.
It's important to remember that bagging is not limited to these specific algorithms. Any machine learning algorithm can be used within the bagging framework, as long as it can be trained independently on different data subsets. However, the effectiveness of bagging will depend on the characteristics of the chosen algorithm and the specific problem being addressed.
Scikit-learn example, using BaggingClassifier:
from sklearn.ensemble import BaggingClassifier
from sklearn.linear_model import LogisticRegression
# Define base model
base_model = LogisticRegression()
# Create a bagging ensemble
bagging_model = BaggingClassifier(base_estimator=base_model, n_estimators=10)
# Train the ensemble on your data
bagging_model.fit(X_train, y_train)
# Make predictions using the ensemble
predictions = bagging_model.predict(X_test)
Similarly, there is a BaggingRegressor, which is an ensemble meta-estimator that fits base regressors each on random subsets of the original dataset and then aggregate their predictions (either by voting or by averaging) to form a final prediction. (See example).
2️⃣ Boosting:
- Goal: Reduce bias of the model.
-
Method:
- Trains models sequentially.
- Each subsequent model focuses on the data points the previous models misclassified.
- Assigns higher weights to misclassified points in the training data for the next model.
- Combines the predictions of all models using a weighted sum (regression) or weighted vote (classification).
Common Base Models in Boosting:
- Decision Trees: Similar to bagging, decision trees are popular choices for boosting due to their high bias and low variance. Commonly used algorithms include AdaBoost and Gradient Boosting.
- Linear Regression: Boosting can also be applied to linear models like linear regression, often referred to as Lasso regression or Elastic Net, which can improve sparsity and reduce overfitting.
- Support Vector Machines (SVM): Boosting can be used with SVMs, particularly for multi-class classification problems, to improve overall accuracy and robustness.
Scikit-learn example of AdaBoostClassifier:
from sklearn.ensemble import AdaBoostClassifier
# Define base model
base_model = LogisticRegression()
# Create boosting ensemble
boosting_model = AdaBoostClassifier(base_estimator=base_model, n_estimators=10)
# Train the ensemble on your data
boosting_model.fit(X_train, y_train)
# Make predictions using the ensemble
predictions = boosting_model.predict(X_test)
There is the corresponding AdaBoostRegressor, that begins by fitting a regressor on the original dataset and then fits additional copies of the regressor on the same dataset but where the weights of instances are adjusted according to the error of the current prediction. (See example).
3️⃣ Stacking:
- Goal: Improve accuracy.
-
Method:
- Utilizes diverse base models for heterogeneity and potentially better generalization.
- Introduces an additional model to learn from the base model predictions.
Common base models for stacking:
- Decision Trees: Popular due to their low bias and ability to handle complex relationships.
- Support Vector Machines (SVMs): Effective for high-dimensional data and certain types of classification problems.
- K-Nearest Neighbors (KNN): This can be useful for capturing local patterns in the data.
- Neural Networks: Powerful models that can learn complex non-linear relationships.
Scikit-learn example of Voting Classifier:
from sklearn.ensemble import VotingClassifier
# Define base models
model1 = RandomForestClassifier(n_estimators=100)
model2 = AdaBoostClassifier(n_estimators=100)
# Define the ensemble
ensemble = VotingClassifier(estimators=[("rf", model1), ("ada", model2)], voting="hard")
# Train the ensemble
ensemble.fit(X_train, y_train)
# Make predictions
y_pred = ensemble.predict(X_test)
Correspondingly, there is a Voting Regressor, which combines conceptually different machine learning regressors and returns the average predicted values.
Stacked generalization is a method for combining estimators to reduce their biases. The StackingClassifier and StackingRegressor provide such strategies which can be applied to classification and regression problems.
Key differences:
Feature | Bagging | Boosting | Stacking |
---|---|---|---|
Learning Style | Parallel | Sequential | Staged |
Base Model Diversity | Encouraged | Not necessarily | Heterogeneous |
Focus | Reduce variance | Reduce bias | Improve accuracy |
Strengths | Robust to overfitting, computationally efficient | Handles imbalanced data, improves underfitting | Leverages diverse models |
Weaknesses | May not significantly improve low-bias models | Can be sensitive to noisy data | More complex to implement |
Choosing between bagging, boosting, or stacking depends on the specific problem and the characteristics of the base learner. Generally, bagging is preferred for high-variance learners, boosting is better suited for high-bias learners, and stacking is a better method to increase accuracy.
🔗 📔 See Jupyter Notebook Example for Decision Trees and Ensemble Learning
Created: 02/17/2024 (C. Lizárraga); Last update: 02/22/2024 (C. Lizárraga)
UArizona DataLab, Data Science Institute, 2024