A boosting method where each weak learner in the sequence is built to predict the residual errors of their preceding model
Usage
- ❇️ High accuracy
- ❇️ Generally scalable
- ❇️ Work well with missing data, since it treats this as information to learn
- ❇️ Require minimal pre-processing, like other tree-based models
- 🔴 Many hyperparameters to tune, which can be time-consuming
- 🔴 Difficult to interpret: GBMs can only show how important each feature is relative to the other features; they do not have coefficients or directionality. Therefore, they are not suited in health care and finance, where predictions should be explained clearly. In that case, consider linear models.
- 🔴 Have difficulty extrapolating i.e. likely incorrectly predicting values outside of range of values in the training data
- 🔴 Prone to overfitting if there are too many poorly trained hyperparameters