Tuning XGBoost Models

PreviousTracking XGBoost Training NextHyperOpt

Last updated 12 months ago

Tuning XGBoost Models

`sigopt.xgboost.experiment`

The sigopt.xgboost.experiment function simplifies the hyperparameter tuning process of an XGBoost model, by automatically creating and running a . This function also extends the automatic parameter, metric, and metadata logging of our sigopt.xgboost.run API to the SigOpt experimentation platform.

However, this automatic logging is only one of the features, andsigopt.xgboost.experiment offers the following crucial improvements over the existing SigOpt AI Experiment API when tuning an XGBoost model:

A simplified and streamlined API that knows the exact problem it is tuning: XGBoost, and makes intelligent decisions accordingly.
Automatic selection of the parameter search space, optimization metric, and the tuning budget.
A preset list of standard optimization metrics to choose from.
An improved hyperparameter optimization routine that leverages advanced methods in metalearning and multi-fidelity optimization to learn a more performant model in less time.

This API has been designed with ease-of-use in mind, so that you may run an XGBoost Experiment as effortlessly as possible.

Examples

To give you an initial feel for how you might use the sigopt.xgboost.experiment API, we provide multiple examples showcasing its simplicity and flexibility. Our API aims to reduce the overall complexity of intelligent experimentation and hyperparameter optimization by automatically selecting parameters, metrics, and even the budget where needed.

The sequence of examples are provided in the tabs below, increases in complexity.

Automatic Experiment Configuration

The parameter search space, metric, and budget are determined by SigOpt based on the training data provided.

from sklearn import datasets
from sklearn.model_selection import train_test_split
import xgboost
import sigopt.xgboost

bc = datasets.load_breast_cancer()
(Xtrain, Xtest, ytrain, ytest) = train_test_split(bc.data, bc.target, test_size=0.5, random_state=42)
dtrain = xgboost.DMatrix(data=Xtrain, label=ytrain, feature_names=bc.feature_names)
dtest = xgboost.DMatrix(data=Xtest, label=ytest, feature_names=bc.feature_names)

my_config = dict(
 name="My XGBoost Experiment", # Let SigOpt set the tuning parameters, metrics, and budget.
)
experiment = sigopt.xgboost.experiment(
 experiment_config=my_config,
 dtrain=dtrain,
 evals=[(dtest, "val_set")],
 params = {"objective": "binary:logistic"}, # XGB parameters to be fixed for all runs
)

Auto Parameter and Metric Selection

The parameter search bounds are determined by SigOpt based on the training data provided. Select the optimized metric with a single string.

from sklearn import datasets
from sklearn.model_selection import train_test_split
import xgboost
import sigopt.xgboost

bc = datasets.load_breast_cancer()
(Xtrain, Xtest, ytrain, ytest) = train_test_split(bc.data, bc.target, test_size=0.5, random_state=42)
dtrain = xgboost.DMatrix(data=Xtrain, label=ytrain, feature_names=bc.feature_names)
dtest = xgboost.DMatrix(data=Xtest, label=ytest, feature_names=bc.feature_names)

my_config = dict(
  name="My XGBoost Experiment",
  parameters = [
    dict(name="max_depth"), # Let SigOpt set the appropriate bounds
    dict(name="eta"),
  ],
  metrics="accuracy", # Only use the metric name
  budget=20
)
experiment = sigopt.xgboost.experiment(
  experiment_config=my_config,
  dtrain=dtrain,
  evals=[(dtest, "val_set")],
  params = {"objective": "binary:logistic"}, # XGB parameters to be fixed for all runs
)

Auto Parameter Selection

The parameter search bounds are determined by SigOpt based on the training data provided. The optimized metric is fully specified with a dict.Auto Parameter Selection

The parameter search bounds are determined by SigOpt based on the training data provided. The optimized metric is fully specified with a dict.

from sklearn import datasets
from sklearn.model_selection import train_test_split
import xgboost
import sigopt.xgboost

bc = datasets.load_breast_cancer()
(Xtrain, Xtest, ytrain, ytest) = train_test_split(bc.data, bc.target, test_size=0.5, random_state=42)
dtrain = xgboost.DMatrix(data=Xtrain, label=ytrain, feature_names=bc.feature_names)
dtest = xgboost.DMatrix(data=Xtest, label=ytest, feature_names=bc.feature_names)

my_config = dict(
  name="My XGBoost Experiment",
  parameters = [
    dict(name="max_depth"), # Let SigOpt set the appropriate bounds
    dict(name="eta"),
  ],
  metrics=[
    dict(name="accuracy", strategy="optimize", objective="maximize"),
  ],
  budget=20
)
experiment = sigopt.xgboost.experiment(
  experiment_config=my_config,
  dtrain=dtrain,
  evals=[(dtest, "val_set")],
  params = {"objective": "binary:logistic"}, # XGB parameters to be fixed for all runs
)

Full Experiment Configuration

A fully enumerated experiment config, which includes a full parameter search space with type and bounds, a full metric list that asks SigOpt to maximize classification accuracy, and an explicit budget.

from sklearn import datasets
from sklearn.model_selection import train_test_split
import xgboost
import sigopt.xgboost

bc = datasets.load_breast_cancer()
(Xtrain, Xtest, ytrain, ytest) = train_test_split(bc.data, bc.target, test_size=0.5, random_state=42)
dtrain = xgboost.DMatrix(data=Xtrain, label=ytrain, feature_names=bc.feature_names)
dtest = xgboost.DMatrix(data=Xtest, label=ytest, feature_names=bc.feature_names)
    
my_config = dict(
  name="My XGBoost Experiment",
  parameters = [
    dict(name="max_depth", type="int", bounds=dict(min=3, max=12)),
    dict(name="eta", type="double", bounds=dict(min=.05, max =.5), transformation="log"),
  ],
  metrics=[
    dict(name="accuracy", strategy="optimize", objective="maximize"),
  ],
  budget=20
)
experiment = sigopt.xgboost.experiment(
  experiment_config=my_config,
  dtrain=dtrain,
  evals=[(dtest, "val_set")],
  params = {"objective": "binary:logistic"}, # XGB parameters to be fixed for all runs
)

The simple examples are made possible because of key research advances made by SigOpt research. It is worth noting that the decrease in simplicity corresponds to a decrease in flexibility; if you opt to omit the metric for example, you will be forced to optimize the metric we select for you.

Input Arguments for `sigopt.xgboost.experiment`

The API for an XGBoost Experiment follows:

Argument

Type

Description

experiment_config

dict

The configuration of the Experiment. See the following section for more information.

dtrain

The training dataset.

eval

These are the validation set(s). If it is a list, the first dataset will be used to compute the optimization metric.

params

dict

num_boost_round

int

Optional. The number of boosting rounds. Leave this argument as blank if num_boost_round is specified in parameters field in the experiment_config.

early_stopping_rounds

int

Optional. XGBoost stops training when the validation metric has not improved for early_stopping_rounds. NOTE: SigOpt sets early_stopping_rounds to 10 by default. To turn off early stopping, explicitly set it to None.

run_options

dict

The experiment_config is most important to understand, since it not only determines how your Experiment executes, but also possesses the most flexibility and extensibility out of all XGBoost Experiment API arguments. Thus, we explain it next.

Specifying `sigopt.xgboost.experiment` through `experiment_config`

An experiment config has the following keys:

Key

Type

Value

name

string

Name of the Experiment.

parameters

metrics

budget

int

Optional. An integer defining the minimum number of SigOpt Runs in a given SigOpt Experiment.

Parameter Space

We show an illustrative example below on how to set the parameter space.

sigopt.xgboost.experiment(...,
  parameters = [
    {"name": "eta"}, # name only
    {"name": "num_boost_round", "bounds": {"min": 10, "max": 100}}, # name and bounds only
    {"name": "tree_method", "type": "categorical", "categorical_values": ["exact", "hist"]},
  ]
)

There are three different ways of specifying the Experiment parameters:

name only: SigOpt autoselects the bounds and type.
name and type: SigOpt autoselects the bounds.
name, type, and bounds/categorical_values: Explicit parameter specification.

These specifications may be mixed as in the example above. Currently, SigOpt only autoselects the bounds for the following parameters:

eta, max_delta_step, alpha, gamma, lambda, 
max_depth, min_child_weight, num_boost_round, 
colsample_bylevel, colsample_bynode, colsample_bytree

Any parameter that is not on this list must have its bounds or categorical_values explicitly stated.

Metric Space

The metric space of an Experiment is defined by both the metrics argument of the experiment_config and the datasets listed in the evals argument.

There are two ways of specifying the metric space.

# Option 1: Fully specified like a SigOpt Experiment
metrics = [{"name": "accuracy", "strategy": "optimize", "objective": "maximize"}]

# Option 2: Using only a String
metrics = "F1"

Below is a table of the metrics we natively support for classification and regression.

Task

Options

Default

Classification

accuracy, F1, precision, recall

accuracy

Regression

mean absolute error, mean squared error

mean squared error

PreviousTracking XGBoost Training NextHyperOpt

Last updated 12 months ago

sigopt.xgboost.experiment

Examples

Input Arguments for sigopt.xgboost.experiment

Specifying sigopt.xgboost.experiment through experiment_config

Parameter Space

Metric Space

sigopt.xgboost.experiment

Examples

Input Arguments for sigopt.xgboost.experiment

Specifying sigopt.xgboost.experiment through experiment_config

Parameter Space

Metric Space

`sigopt.xgboost.experiment`

Input Arguments for `sigopt.xgboost.experiment`

Specifying `sigopt.xgboost.experiment` through `experiment_config`

`sigopt.xgboost.experiment`

Input Arguments for `sigopt.xgboost.experiment`

Specifying `sigopt.xgboost.experiment` through `experiment_config`