XGBoost is an optimized gradient boosting package of tree ensemble for fast training, effective regularization of features, and tunable hyper-parameters

Summary of innovations

XGBoost inherits advantages of decision trees

Interpretable

Non-parametric

Added properties compared to decision tree (ML)

tree ensemble

gradient boosting

regularization

handle missing values

Algorithm

Additive Training (From Gradient Boosting)

It is intractable to learn all the trees at once. Instead, we use an additive strategy: fix what we have learned, and add one new tree at a time. We write the prediction value at step as . Then we have

Which tree do we want at each step? A natural thing is to add the one that optimizes our objective function (comprising loss function and regularization that defines the model complexity)

For many losses of interest (except mean squared error), it is not so easy to get a nice form to take gradient. In the general case, we take the Taylor expansion of the loss function up to the second order:

where is the first-order gradient of loss function, and is the Hessian matrix of loss function

⭐ After we remove all the constants, the specific objective at step becomes

This becomes our optimization goal for the new tree.

One important advantage is that the value of the objective function only depends on and . This is how XGBoost supports every loss function, including logistic regression and pairwise ranking, by using exactly the same solver that takes and as input!

My (Chiffon) Nguyen

Explorer

extreme gradient boosting XGBoost

Algorithm

Graph View

Backlinks