How Gradient Boosting Works

Develop an intuitive understanding of how gradient boosting works.

Gradient boosting vs. random forests

Gradient boosting is a machine learning technique that employs an ensemble of weak learners, usually decision treesSmall decision tree models are also called weak learners.. Weak learners (models) are iteratively added to the ensemble in gradient boosting. Each model is trained to address errors that remain from the predictions of the previously added model. Gradient boosting ensembles make predictions by applying a weight to each weak learner’s prediction and then aggregating the results.

While gradient boosting and random forests are ensembles of decision trees, the algorithms are quite different. The following table summarizes the significant similarities and differences between the algorithms:

Gradient Boosting vs. Random Forests

Gradient Boosting

Random Forests

Can be used for classification and regression

Can be used for classification and regression

Models in the ensemble are added iteratively, where each model added is dependent on the previous model.

Models in the ensemble can be added in parallel as each model is independent of all the others.

Models added to the ensemble are trained to address the errors of the previously added model.

Models added to the ensemble are trained on a random subset of the training data.

Individual model predictions are weighted and then aggregated to produce the ensemble’s predictions.

Individual model predictions are not weighted and aggregated to produce the ensemble's predictions.

The algorithm easily overfits the training data, so tuning is required to optimize the bias-variance tradeoff.

The algorithm is designed to directly address the bias-variance tradeoff, so little tuning is typically required.

The algorithm has many hyperparameters that are usually tuned via cross-validation.

The algorithm has a few hyperparameters that can be tuned via cross-validation.

Models are added to the ensemble until the specified limit is reached or ensemble improvements are very small.

Models are added to the ensemble until the specified limit is reached.

Gradient boosting has become a go-to algorithm for production systems because, with proper tuning, gradient boosting often has better predictive performance than random forests. However, we should keep the following in mind:

  • Gradient boosting’s predictive performance gains compared to random forests are typically small.

  • Tuning the gradient boosting algorithms typically takes far more time and ...

Ask