ML Specialization
Combining models in a way that such a combination perform better than individual models (aka base learner). The method improves the stability and accuracy of learning algorithms and reduces overfitting.
Bootstrap the data - bagging: and fitting different models to the same data multiple times (sampling data with replacement) and average the results
Subset the features - boosting: only using a few features (not all) in each weak learner
Repeatedly select random sample with replacement of the training set, to create many bootstrap samples to train base learners (each is an individual model)
Train base learners on their corresponding samples to get results. This is done in parallel, instead of sequentially as boosting.
After training, *aggregate predictions from *
regression: averaging their results
classification: taking the majority of votes
Tune hyperparameters
Usage
Reduces variance: Standalone models can result in high variance. Aggregating base models’ predictions in an ensemble help reduce it.
Fast: Training can happen in parallel across CPU cores and even across different servers.
Good for big data: Bagging doesn’t require an entire training dataset to be stored in memory during model training. We can set the sample size for each bootstrap to a fraction of the overall data, train a base learner, and combine these base learners without ever reading in the entire dataset all at once.
A tree-basedboosting technique, where subsequent weak learners are tweaked so that they try harder to correct points incorrectly predicted by their predecessors.
To do so, the model give wrongly predicted points more weight, and correctly predicted points less weight.
How to assign weights
For example, in Iteration 1, three misclassified plus + signs are given more weight in Iteration 2, urging this individual model to correctly label them (at the expense of other points). After 3 iterations, all learners are combined using ensemble method
Usage
👍 Can handle numeric and categorical features
Function well even there are collinearity among features
Robust against outliers (which is the feature for all tree-based models)
Weight assignment
Code
from sklearn.ensemble import AdaBoostClassifier from sklearn.tree import DecisionTreeClassifier#base_estimator = weak learner#n_estimators = max number of weak learners usedmodel = AdaBoostClassifier(base_estimator = DecisionTreeClassifier(max_depth=2), n_estimators = 4)model.fit(x_train, y_train)model.predict(x_test)