Tracking XGBoost Training
sigopt.xgboost.run
sigopt.xgboost.run
This function provides a wrapper around the xgboost.train
API ; it automatically tracks and records all aspects of the model training process to SigOpt. The sigopt.xgboost.run
API closely matches the xgboost.train
API to minimize any disruption to your workflow, while allowing you to enjoy all the features provided by the SigOpt experiment management platform.
Example
Before executing the example code, make sure you have set up your API token and project name as environment variables. You can follow up the set up instructions here or set them in Python like the example below.
Now we demonstrate how to modify your existing XGBoost training script using the xgboost.train
API to our sigopt.xgboost.run
to enable automatic tracking with SigOpt.
Existing code with xgboost.train
API
xgboost.train
APIUpdated with sigopt.xgboost.run
API
sigopt.xgboost.run
APIAll relevant information from the XGBoost model training process is automatically recorded in a SigOpt Run and can viewed on the Run page in the web app.
Automated Logging with sigopt.xgboost.run
sigopt.xgboost.run
sigopt.xgboost.run
automatically logs the relevant information and metadata of the XGBoost training process and the resulting model. We list all the automated logging functionalities below.
Logging Parameters
sigopt.xgboost.run
automatically logs all XGBoost parameters that you specify via the params
argument. These values are displayed under the Parameter Values section on the Run page. In additional, sigopt.xgboost.run
also records the XGBoost default values for all other parameters.
Logging Metrics
If the evaluation metrics are specified in the params
argument via eval_metric
, sigopt.xgboost.run
automatically logs every evaluation metric value for each evaluation dataset after the training has completed. The metric values are listed under Metrics on the Run page.
For certain learning tasks, sigopt.xgboost.run
logs additional metrics computed using the evaluation datasets. The supported tasks are inferred from the objective
parameter prefix. We list the objective prefixes and the corresponding metrics below.
accuracy
binary:
, multi:
Percentage of predicted labels that match true labels.
precision
binary:
, multi:
Ratio of true positive predictions to all positive predictions.
recall
binary:
, multi:
Ratio of true positive predictions to all positive labels.
F1
binary:
, multi:
The harmonic mean of precision and recall.
mean squared error
reg:
Squared error between each predicted label and true label.
mean absolute error
reg:
Absolute error between each predicted label and true label.
Training Time
All objectives
Number of seconds to execute xgboost.train
.
Logging Checkpoints
If the evaluation metrics are specified in the params
, sigopt.xgboost.run
also automatically tracks the iterative evaluation metric using checkpoints. You can control the frequency of the checkpoint logging using the verbose_eval
argument. If verbose_eval
is an integer, then SigOpt logs the evaluation metric every given verbose_eval
boosting rounds. If verbose_eval
is True
, then SigOpt logs the evaluation metric every boosting round. If it is False
, SigOpt logs the evaluation metric every 5 boosting rounds.
Caveat: SigOpt only supports up to 200 total checkpoints per Run. sigopt.xgboost.run
will automatically adjust the checkpoint logging frequency accordingly to guarantee that the total number of checkpoints does not exceed this limit.
Logging Feature Importances
sigopt.xgboost.run
automatically computes and tracks the feature importances of the training dataset. Generally, feature importances indicate how useful each feature is in determining boosted decision tree predictions. Feature importances are computed using the XGBoost get_score
function with importance_type=weight
. The top 50 features are tracked and stored on the Run page.
Logging Metadata
sigopt.xgboost.run
automatically tracks additional metadata associated with the model training process such as dataset dimensions, dataset names, XGBoost version, and more.
Input Arguments for sigopt.xgboost.run
sigopt.xgboost.run
All arguments supported by xgboost.train
are supported by sigopt.xgboost.run
. You can preserve your existing XGBoost code and realize the benefits of tracking the model training as a SigOpt TrainingRun with minimal code change. To better understand the XGBoost Learning API that this integration is based on, you can refer to the XGBoost documentation pages on the Training API and XGBoost Parameters respectively.
Optional Argument - run_options
run_options
You can control what aspect of the model training to be logged by the sigopt.xgboost.run
call with the run_options
argument. Here, we provide an example of using run_options
argument to create a Run with an assigned name and selectively turn off checkpoint logging and stdout logging.
The following table lists the automated logging capabilities you can control using the run_options
dict.
autolog_checkpoints
Boolean
True
Automatically log the iterated evaluation metrics for every specified interval.
autolog_feature_importances
Boolean
True
Automatically log the dataset feature importance scores.
autolog_metrics
Boolean
True
Automatically log additional evaluation metrics.
autolog_xgboost_defaults
Boolean
True
Automatically log XGBoost default values for unspecified parameters.
autolog_stderr
Boolean
True
Automatically log stderr from the xgboost.train
call.
autolog_stdout
Boolean
True
Automatically log stdout from the xgboost.train
call.
autolog_sys_info
Boolean
True
Automatically log the Python version and XGBoost version.
name
string
None
Assign a name to a newly created TrainingRun.
run
None
Output of sigopt.xgboost.run
sigopt.xgboost.run
The sigopt.xgboost.run
function returns an XGBRun
object which allows you to access the resulting XGBoost Booster object via .model
and the SigOpt RunContext object via .run
. In the following example, we demonstrate how to continue logging information with the XGBRun
object post model training.
Last updated