Reproducibility in SigOpt
Reproducing a hyperparameter tuning experiment
The optimization engine that powers SigOpt learns the relationship between a user-defined parameter space and metric space, often referred to as a response surface. The process by which the SigOpt optimizer generates suggested parameter configurations is inherently distributed and stochastic. By construction, there is no process in which a user can “seed” the stochastic elements of the SigOpt optimizer. However, reproducibility is a common request and need for modeling pipelines, and SigOpt supports this in two ways. As a warning, both of these methods will generally underperform the default SigOpt optimizer and should only be considered when reproducibility is more important than pure performance.
Replay an experiment - In the machine learning context, the motivation for non-stochastic reproducibility is often related to testing a broader modeling pipeline of which the HPO process is merely a subroutine of. The goal is to remove the “randomness” of the optimizer in order to test other elements of the pipeline. Any SigOpt experiment, regardless of the optimization method originally used, can be replayed, allowing a modeling pipeline to traverse the parameter space identically as many times as desired.
Use a deterministic optimizer - In some contexts it may be important for the optimizer to be deterministic, but take a different path through the parameter space given a new metric response surface. This can be accomplished by running a grid search, or Bringing Your Own Optimizer and setting the seed locally.
Jupyter Notebook containing code demonstrating the concepts below.
Strategy 1: Replay a SigOpt Experiment
We will first execute a SigOpt experiment in the normal fashion and call this Experiment-0. Then we will show how to replay the history of an experiment by iterating over previous Runs and you will get the identical path, just as if you had set a seed and made SigOpt deterministic. To do this, we will need to use the sigopt.Connection object to connect to the SigOpt API and access data from the original experiment whose path we wish to follow in Experiment-1. Observe that in the execute_run function we inject random noise into the metric evaluation process - simulating variations in other aspects of the model training pipeline while keeping the hyperparameter search path fixed.
Strategy 2: Use a deterministic optimizer
If you want to fix the seed so that all optimization paths are consistent given the same input, but vary given different inputs, then SigOpt allows modelers to bring their own optimizer that supports such random seeding locally - such as Optuna or Hyperopt - while still benefiting from SigOpt’s Experiment Management capabilities.
Notice how in the SigOpt replay strategy that if you run the cell twice, hyperopt will generate the exact same path of suggestions when the random seed is fixed, similar to the replay strategy. However, if anything about the objective function contains noise in the measurement process - a common feature of real-world modeling contexts - then the path will vary.
Last updated