Hyperparameter Tuning
Apply grid search cross-validation to XGBoost models.
We'll cover the following
Chapter Goals:
- Apply grid search cross-validation to an XGBoost model
A. Using scikit-learn’s GridSearchCV
One of the benefits of using XGBoost's scikit-learn style models is that we can use the models with the actual scikit-learn API. A common scikit-learn object used with XGBoost models is the GridSearchCV
wrapper. For more on GridSearchCV
see the Data Modeling section.
The code below applies grid search cross-validation to a binary classification XGBoost model.
model = xgb.XGBClassifier(objective='binary:logistic', eval_metric='logloss', use_label_encoder=False)params = {'max_depth': range(2, 5)}from sklearn.model_selection import GridSearchCVcv_model = GridSearchCV(model, params, cv=4)# predefined data and labelscv_model.fit(data, labels)print('Best max_depth: {}\n'.format(cv_model.best_params_['max_depth']))# new_data contains 2 new data observationsprint('Predictions:\n{}'.format(repr(cv_model.predict(new_data))))
In the code above, we applied grid search cross-validation to a binary classification XGBoost model to find the optimal 'max_depth'
parameter (in the range from 2 to 4, inclusive). The K-fold cross-validation (the default for grid search) uses 4 folds. Note that the cross-validation process works the same for an XGBRegressor
object.
After calling fit
on data
and labels
, cv_model
represents the cross-validated classification model trained on the dataset. The grid search cross-validation automatically chose the best performing 'max_depth'
parameter, which in this case was 4. The best_params_
attribute contains the best performing hyperparameters after cross-validation.
The official XGBoost documentation provides a list of the possible parameters we can tune for in a model. A couple commonly tuned parameters are 'max_depth'
and 'eta'
(the learning rate of the boosting algorithm).
Get hands-on with 1300+ tech skills courses.