A boosting method where each weak learner in the sequence is built to predict the residual errors of their preceding model

Usage

❇️ High accuracy
❇️ Generally scalable
❇️ Work well with missing data, since it treats this as information to learn
❇️ Require minimal pre-processing, like other tree-based models
🔴 Many hyperparameters to tune, which can be time-consuming
🔴 Difficult to interpret: GBMs can only show how important each feature is relative to the other features; they do not have coefficients or directionality. Therefore, they are not suited in health care and finance, where predictions should be explained clearly. In that case, consider linear models.
🔴 Have difficulty extrapolating i.e. likely incorrectly predicting values outside of range of values in the training data
🔴 Prone to overfitting if there are too many poorly trained hyperparameters

My (Chiffon) Nguyen