Tracking XGBoost Training

`sigopt.xgboost.run`

This function provides a wrapper around the xgboost.train API ; it automatically tracks and records all aspects of the model training process to SigOpt. The sigopt.xgboost.run API closely matches the xgboost.train API to minimize any disruption to your workflow, while allowing you to enjoy all the features provided by the SigOpt experiment management platform.

Example

Before executing the example code, make sure you have set up your API token and project name as environment variables. You can follow up the set up instructions here or set them in Python like the example below.

import os
os.environ["SIGOPT_API_TOKEN"] = # PASTE API TOKEN FROM https://app.sigopt.com/tokens/info
os.environ["SIGOPT_PROJECT"] = "my-xgboost-project"

Now we demonstrate how to modify your existing XGBoost training script using the xgboost.train API to our sigopt.xgboost.run to enable automatic tracking with SigOpt.

Existing code with `xgboost.train` API

from sklearn import datasets
from sklearn.model_selection import train_test_split
import xgboost

bc = datasets.load_breast_cancer()
(Xtrain, Xtest, ytrain, ytest) = train_test_split(bc.data, bc.target, test_size=0.5, random_state=42)
dtrain = xgboost.DMatrix(data=Xtrain, label=ytrain, feature_names=bc.feature_names)
dtest = xgboost.DMatrix(data=Xtest, label=ytest, feature_names=bc.feature_names)

params = {
  "eta": 0.05,
  "gamma": 1,
  "max_depth": 4,
  "objective": "binary:logistic"
}
booster = xgboost.train(
  params,
  dtrain,
  num_boost_round=25,
  evals=[(dtest, "val_set")],
)

Updated with `sigopt.xgboost.run` API

from sklearn import datasets
from sklearn.model_selection import train_test_split
import xgboost
import sigopt.xgboost

bc = datasets.load_breast_cancer()
(Xtrain, Xtest, ytrain, ytest) = train_test_split(bc.data, bc.target, test_size=0.5, random_state=42)
dtrain = xgboost.DMatrix(data=Xtrain, label=ytrain, feature_names=bc.feature_names)
dtest = xgboost.DMatrix(data=Xtest, label=ytest, feature_names=bc.feature_names)

params = {
  "eta": 0.05,
  "gamma": 1,
  "max_depth": 4,
  "objective": "binary:logistic"
}
run_options = {
 "name": "xgboost-tracking-with-sigopt"
}
xgb_run = sigopt.xgboost.run(
  params,
  dtrain,
  num_boost_round=25,
  evals=[(dtest,"val_set")],
  run_options=run_options,
)
booster = xgb_run.model
print(f"View run at https://app.sigopt.com/run/{xgb_run.run.id}")

All relevant information from the XGBoost model training process is automatically recorded in a SigOpt Run and can viewed on the Run page in the web app.

Automated Logging with `sigopt.xgboost.run`

sigopt.xgboost.run automatically logs the relevant information and metadata of the XGBoost training process and the resulting model. We list all the automated logging functionalities below.

Logging Parameters

sigopt.xgboost.run automatically logs all XGBoost parameters that you specify via the params argument. These values are displayed under the Parameter Values section on the Run page. In additional, sigopt.xgboost.run also records the XGBoost default values for all other parameters.

Logging Metrics

If the evaluation metrics are specified in the params argument via eval_metric, sigopt.xgboost.run automatically logs every evaluation metric value for each evaluation dataset after the training has completed. The metric values are listed under Metrics on the Run page.

sigopt.xgboost.run(
  params={
    "objective": "binary:logistic", # SigOpt logs additional metrics for binary classification task
    "eval_metric": ["aucpr", "logloss"], # Automatically logged to SigOpt
  },
  dtrain=dtrain,
  evals=[(dtest, "val_set")],
)

For certain learning tasks, sigopt.xgboost.run logs additional metrics computed using the evaluation datasets. The supported tasks are inferred from the objective parameter prefix. We list the objective prefixes and the corresponding metrics below.

Metric Name

Objective Prefix

Description

Logging Checkpoints

If the evaluation metrics are specified in the params, sigopt.xgboost.run also automatically tracks the iterative evaluation metric using checkpoints. You can control the frequency of the checkpoint logging using the verbose_eval argument. If verbose_eval is an integer, then SigOpt logs the evaluation metric every given verbose_eval boosting rounds. If verbose_eval is True, then SigOpt logs the evaluation metric every boosting round. If it is False, SigOpt logs the evaluation metric every 5 boosting rounds.

Caveat: SigOpt only supports up to 200 total checkpoints per Run. sigopt.xgboost.run will automatically adjust the checkpoint logging frequency accordingly to guarantee that the total number of checkpoints does not exceed this limit.

Logging Feature Importances

sigopt.xgboost.run automatically computes and tracks the feature importances of the training dataset. Generally, feature importances indicate how useful each feature is in determining boosted decision tree predictions. Feature importances are computed using the XGBoost get_score function with importance_type=weight. The top 50 features are tracked and stored on the Run page.

Logging Metadata

sigopt.xgboost.run automatically tracks additional metadata associated with the model training process such as dataset dimensions, dataset names, XGBoost version, and more.

Input Arguments for `sigopt.xgboost.run`

All arguments supported by xgboost.train are supported by sigopt.xgboost.run. You can preserve your existing XGBoost code and realize the benefits of tracking the model training as a SigOpt TrainingRun with minimal code change. To better understand the XGBoost Learning API that this integration is based on, you can refer to the XGBoost documentation pages on the Training API and XGBoost Parameters respectively.

Optional Argument - `run_options`

You can control what aspect of the model training to be logged by the sigopt.xgboost.run call with the run_options argument. Here, we provide an example of using run_options argument to create a Run with an assigned name and selectively turn off checkpoint logging and stdout logging.

run_options = {
  "autolog_checkpoints": False, # Turn off checkpoint logging
  "autolog_stdout": False, # Turn off stdout logging
  "name": "custom-logging-run",
}

xgb_run = sigopt.xgboost.run(
  params={"eta": 0.02, "objective": "reg:squarederror"}
  dtrain=dtrain,
  num_boost_round=23,
  evals=[(dtest, "testset")],
  run_options=run_options, # Passing in the run_options
)

The following table lists the automated logging capabilities you can control using the run_options dict.

Key

Type

Default value

Description

Output of `sigopt.xgboost.run`

The sigopt.xgboost.run function returns an XGBRun object which allows you to access the resulting XGBoost Booster object via .model and the SigOpt RunContext object via .run. In the following example, we demonstrate how to continue logging information with the XGBRun object post model training.

# save the model and log the filepath after training

xgb_run = sigopt.xgboost.run(params, dtrain)
filepath = "./saved_models/xgb_booster.json"
xgb_run.model.save_model(filepath) # Saving booster model 
xgb_run.run.log_metadata("Booster saved filepath", filepath) # Logging the filepath
xgb_run.run.end() # mark run as completed