Tracking XGBoost Training
Last updated
Last updated
sigopt.xgboost.run
This function provides a wrapper around the ; it automatically tracks and records all aspects of the model training process to SigOpt. The sigopt.xgboost.run
API closely matches the xgboost.train
API to minimize any disruption to your workflow, while allowing you to enjoy all the provided by the SigOpt experiment management platform.
Before executing the example code, make sure you have set up your and project name as environment variables. You can follow up the set up instructions or set them in Python like the example below.
Now we demonstrate how to modify your existing XGBoost training script using the xgboost.train
API to our sigopt.xgboost.run
to enable automatic tracking with SigOpt.
xgboost.train
APIsigopt.xgboost.run
APIAll relevant information from the XGBoost model training process is automatically recorded in a SigOpt Run and can viewed on the Run page in the web app.
sigopt.xgboost.run
sigopt.xgboost.run
automatically logs the relevant information and metadata of the XGBoost training process and the resulting model. We list all the automated logging functionalities below.
accuracy
binary:
, multi:
Percentage of predicted labels that match true labels.
precision
binary:
, multi:
Ratio of true positive predictions to all positive predictions.
recall
binary:
, multi:
Ratio of true positive predictions to all positive labels.
F1
binary:
, multi:
The harmonic mean of precision and recall.
mean squared error
reg:
Squared error between each predicted label and true label.
mean absolute error
reg:
Absolute error between each predicted label and true label.
Training Time
All objectives
Number of seconds to execute xgboost.train
.
Caveat: SigOpt only supports up to 200 total checkpoints per Run. sigopt.xgboost.run
will automatically adjust the checkpoint logging frequency accordingly to guarantee that the total number of checkpoints does not exceed this limit.
sigopt.xgboost.run
automatically tracks additional metadata associated with the model training process such as dataset dimensions, dataset names, XGBoost version, and more.
sigopt.xgboost.run
run_options
You can control what aspect of the model training to be logged by the sigopt.xgboost.run
call with the run_options
argument. Here, we provide an example of using run_options
argument to create a Run with an assigned name and selectively turn off checkpoint logging and stdout logging.
The following table lists the automated logging capabilities you can control using the run_options
dict.
autolog_checkpoints
Boolean
True
Automatically log the iterated evaluation metrics for every specified interval.
autolog_feature_importances
Boolean
True
Automatically log the dataset feature importance scores.
autolog_metrics
Boolean
True
Automatically log additional evaluation metrics.
autolog_xgboost_defaults
Boolean
True
Automatically log XGBoost default values for unspecified parameters.
autolog_stderr
Boolean
True
Automatically log stderr from the xgboost.train
call.
autolog_stdout
Boolean
True
Automatically log stdout from the xgboost.train
call.
autolog_sys_info
Boolean
True
Automatically log the Python version and XGBoost version.
name
string
None
Assign a name to a newly created TrainingRun.
run
None
sigopt.xgboost.run
sigopt.xgboost.run
automatically logs all that you specify via the params
argument. These values are displayed under the Parameter Values section on the Run page. In additional, sigopt.xgboost.run
also records the XGBoost default values for all other parameters.
If the are specified in the params
argument via eval_metric
, sigopt.xgboost.run
automatically logs every evaluation metric value for each evaluation dataset after the training has completed. The metric values are listed under Metrics on the Run page.
For certain , sigopt.xgboost.run
logs additional metrics computed using the evaluation datasets. The supported tasks are inferred from the objective
parameter prefix. We list the objective prefixes and the corresponding metrics below.
If the are specified in the params
, sigopt.xgboost.run
also automatically tracks the iterative evaluation metric using checkpoints. You can control the frequency of the checkpoint logging using the verbose_eval
argument. If verbose_eval
is an integer, then SigOpt logs the evaluation metric every given verbose_eval
boosting rounds. If verbose_eval
is True
, then SigOpt logs the evaluation metric every boosting round. If it is False
, SigOpt logs the evaluation metric every 5 boosting rounds.
sigopt.xgboost.run
automatically computes and tracks the feature importances of the training dataset. Generally, feature importances indicate how useful each feature is in determining boosted decision tree predictions. Feature importances are computed using the with importance_type=weight
. The top 50 features are tracked and stored on the Run page.
All arguments supported by xgboost.train
are supported by sigopt.xgboost.run
. You can preserve your existing XGBoost code and realize the benefits of tracking the model training as a SigOpt TrainingRun with minimal code change. To better understand the XGBoost Learning API that this integration is based on, you can refer to the XGBoost documentation pages on the and respectively.
Pass an existing RunContext to sigopt.xgboost.run
. This is useful for creating a custom .
The sigopt.xgboost.run
function returns an XGBRun
object which allows you to access the resulting object via .model
and the SigOpt object via .run
. In the following example, we demonstrate how to continue logging information with the XGBRun
object post model training.