Another Way of Growing Trees: XGBoost's grow_policy

Learn about XGBoost's lossguide grow policy and how to set the tree_method hyperparameter.

Controlling tree growth in XGBoost

In addition to limiting the maximum depth of trees using a max_depth hyperparameter, there is another paradigm for controlling tree growth: finding the node where a split would result in the greatest reduction in the loss function, and splitting this node, regardless of how deep it will make the tree. This may result in a tree with one or two very deep branches, while the other branches may not have grown very far. XGBoost offers a hyperparameter called grow_policy, and setting this to lossguide results in this kind of tree growth, while the depthwise option is the default and grows trees to an indicated max_depth, as we’ve done in the chapter “Decision Trees and Random Forests,” and so far in this chapter. The lossguide grow policy is a newer option in XGBoost and mimics the behavior of LightGBM, another popular gradient boosting package.

To use the lossguide policy, it is necessary to set another hyperparameter we haven’t discussed yet, tree_method, which must be set to hist or gpu-hist. Without going into too much detail, the hist method will use a faster way of searching for splits. Instead of looking between every sequential pair of sorted feature values for the training samples in a node, the hist method builds a histogram, and only considers splits on the edges of the histogram. So, for example, if there are 100 samples in a node, their feature values may be binned into 10 groups, meaning there are only 9 possible splits to consider instead of 99.

Using the lossguide grow policy

We can instantiate an XGBoost model for the lossguide grow policy as follows, using a learning rate of 0.1 based on intuition from our hyperparameter exploration in the previous exercise:

xgb_model_3 = xgb.XGBClassifier( 
            n_estimators=1000,\ 
            max_depth=0,\ 
            learning_rate=0.1,\ 
            verbosity=1,\ 
            objective='binary:logistic',\ 
            use_label_encoder=False,\ 
            n_jobs=-1,\ 
            tree_method='hist',\ 
            grow_policy='lossguide')

Notice here that we’ve set max_depth=0, because this hyperparameter is not relevant for the lossguide policy. Instead, we are going to set a hyperparameter called max_leaves, which simply controls the maximum number of leaves in the trees that will be grown. We’ll do a hyperparameter search of values ranging from 5 to 100 leaves:

max_leaves_values = list(range(5,105,5)) 
print(max_leaves_values[:5]) 
print(max_leaves_values[-5:])

This should output the following:

[5, 10, 15, 20, 25] 
[80, 85, 90, 95, 100]

Now we are ready to repeatedly fit and validate the model across this range of hyperparameter values, similar to what we’ve done previously:

%%time 
val_aucs = [] 
for max_leaves in max_leaves_values: 
    #Set parameter and fit model 
    xgb_model_3.set_params(**{'max_leaves':max_leaves}) 
    xgb_model_3.fit(X_train, y_train, eval_set=eval_set, eval_metric='auc', verbose=False,\
    early_stopping_rounds=30) 
    #Get validation score 
    val_set_pred_proba = xgb_model_3.predict_proba(X_val)[:,1] 
    val_aucs.append(roc_auc_score(y_val, val_set_pred_proba))

The output will include the wall time for all of these fits, which was about 24 seconds in testing. Now let’s put the results in a data frame:

max_leaves_df = pd.DataFrame({'Max leaves':max_leaves_values, 'Validation AUC':val_aucs})

We can visualize how the validation AUC changes with the maximum number of leaves, similar to our visualization of the learning rate:

mpl.rcParams['figure.dpi'] = 400 
max_leaves_df.set_index('Max leaves').plot()

This will result in a plot like this:

Get hands-on with 1200+ tech skills courses.