XGBoost Hyperparameters: Tuning the Learning Rate
Learn how the learning rate can be adjusted to improve the performance of the random forest model trained with XGBoost.
We'll cover the following
Impact of learning rate on model performance
The learning rate is also referred to as eta in the XGBoost documentation, as well as step size shrinkage. This hyperparameter controls how much of a contribution each new estimator will make to the ensemble prediction. If you increase the learning rate, you may reach the optimal model, defined as having the highest performance on the validation set, faster. However, there is the danger that setting it too high will result in boosting steps that are too large. In this case, the gradient boosting procedure may not converge on the optimal model, due to similar issues to those discussed in Exercise, Using Gradient Descent to Minimize a Cost Function, regarding large learning rates in gradient descent. Let’s explore how the learning rate affects model performance on our synthetic data.
The learning rate is a number between zero and one (inclusive of endpoints, although a learning rate of zero is not useful). We make an array of 25 evenly spaced numbers between 0.01 and 1 for the learning rates we’ll test:
learning_rates = np.linspace(start=0.01, stop=1, num=25)
Now we set up a for
loop to train a model for each learning rate and save the validation scores in an array. We’ll also track the number of boosting rounds that it takes to reach the best iteration. The next several code blocks should be run together as one cell in a Jupyter Notebook. We start by measuring how long this will take, creating empty lists to store results, and opening the for
loop:
%%time
val_aucs = []
best_iters = []
for learning_rate in learning_rates:
At each loop iteration, the learning_rate
variable will hold successive elements of the learning_rate
array. Once inside the loop, the first step is to update the hyperparameters of the model object with the new learning rate. This is accomplished using the set_params
method, which we supply with a double asterisk **
and a dictionary mapping hyperparameter names to values. The **
function call syntax in Python allows us to supply an arbitrary number of keyword arguments, also called kwargs, as a dictionary. In this case, we are only changing one keyword argument, so the dictionary only has one item:
xgb_model_1.set_params(**{'learning_rate':learning_rate})
Now that we’ve set the new learning rate on the model object, we train the model using early stopping as before:
xgb_model_1.fit(X_train, y_train, eval_set=eval_set, eval_metric='auc', verbose=False,\
early_stopping_rounds=30)
After fitting, we obtain the predicted probabilities for the validation set and then use them to calculate the validation ROC AUC. This is added to our list of results using the append
method:
val_set_pred_proba_2 = xgb_model_1.predict_proba(X_val)[:,1]
val_aucs.append(roc_auc_score(y_val, val_set_pred_proba_2))
Finally, we also capture the number of rounds required for each learning rate:
best_iters.append(int(xgb_model_1.get_booster().attributes()['best_iteration']))
The previous five code snippets should all be run together in one cell. The output should be similar to this:
CPU times: user 1min 23s, sys: 526 ms, total: 1min 24s
Wall time: 22.2 s
Now that we have our results from this hyperparameter search, we can visualize validation set performance and the number of iterations. Because these two metrics are on different scales, we’ll want to create a dual y axis plot. pandas makes this easy, so first we’ll put all the data into a data frame:
learning_rate_df = pd.DataFrame({'Learning rate':learning_rates, 'Validation AUC':val_aucs,\
'Best iteration':best_iters})
Now we can visualize performance and the number of iterations for different learning rates like this, noting that:
-
We set the index (
set_index
) so that the learning rate is plotted on the x axis, and the other columns on the y axis. -
The
secondary_y
keyword argument indicates which column to plot on the right-hand y axis. -
The
style
argument allows us to specify different line styles for each column plotted.-o
is a solid line with dots, while--o
is a dashed line with dots:
mpl.rcParams['figure.dpi'] = 400
learning_rate_df.set_index('Learning rate').plot(secondary_y='Best iteration', style=['-o', '--o'])
The resulting plot should look like this:
Get hands-on with 1200+ tech skills courses.