ML Specialization Combining models in a way that such a combination perform better than individual models (aka base learner). The method improves the stability and accuracy of learning algorithms and reduces overfitting.

  1. Bootstrap the data - bagging: and fitting different models to the same data multiple times (sampling data with replacement) and average the results
  2. Subset the features - boosting: only using a few features (not all) in each weak learner

Bagging

bootstrap aggregation bagging

ML Spec, mlcourse.ai

bootstrap aggregation (or bagging) is an ensemble method that averages results from individual models. A popular algorithm is Random Forest

Process

  1. Repeatedly select random sample with replacement of the training set, to create many bootstrap samples to train base learners (each is an individual model)
  2. Train base learners on their corresponding samples to get results. This is done in parallel, instead of sequentially as boosting.
  3. After training, *aggregate predictions from *
    1. regression: averaging their results
    2. classification: taking the majority of votes
  4. Tune hyperparameters

Usage

  • Reduces variance: Standalone models can result in high variance. Aggregating base models’ predictions in an ensemble help reduce it.
  • Fast: Training can happen in parallel across CPU cores and even across different servers.
  • Good for big data: Bagging doesn’t require an entire training dataset to be stored in memory during model training. We can set the sample size for each bootstrap to a fraction of the overall data, train a base learner, and combine these base learners without ever reading in the entire dataset all at once.
Link to original

Boosting

boosting

(Algotech) ML Specialization

boosting is an ensemble method where weak learners are trained sequentially, each individual model learn from predecessors and updates their errors

Usage

  • 👍 Reduce overfitting
  • 👍 Easy to understand
  • 👍 Require minimal preprocessing like scaling
  • 👍 Handle both numerical and categorical features (if tree-based)
  • 🔻 Less scalable than bagging, since each weak learner is dependent on their predecessor, thus cannot be trained in parallel on different servers.

Methods

Link to original

adaptive boosting

A tree-based boosting technique, where subsequent weak learners are tweaked so that they try harder to correct points incorrectly predicted by their predecessors.

To do so, the model give wrongly predicted points more weight, and correctly predicted points less weight.

How to assign weights

For example, in Iteration 1, three misclassified plus + signs are given more weight in Iteration 2, urging this individual model to correctly label them (at the expense of other points). After 3 iterations, all learners are combined using ensemble method

Usage

  • 👍 Can handle numeric and categorical features
  • Function well even there are collinearity among features
  • Robust against outliers (which is the feature for all tree-based models)

Weight assignment

Code

from sklearn.ensemble import AdaBoostClassifier 
from sklearn.tree import DecisionTreeClassifier
 
#base_estimator = weak learner
#n_estimators = max number of weak learners used
model = AdaBoostClassifier(base_estimator = DecisionTreeClassifier(max_depth=2), 
						   n_estimators = 4)
 
model.fit(x_train, y_train)
model.predict(x_test)
Link to original