Comment on page
Setting an experiment budget
Setting an experiment budget is essential to the efficiency and efficacy of your SigOpt experiment.
- Setting a budget informs the optimizer about how to trade off between exploring the hyperparameter space and exploiting the information it already has to resolve the global optimum.
- SigOpt’s optimizer expects the budget you set to be the minimum number of observations you will report.
- You may continue to report runs beyond the original budget. In this case, SigOpt will continue to provide run suggestions. However, SigOpt will assume that the next sample will always be the last one, and provide solutions that will be exploitative in nature.
Choosing the correct budget can be both model and use-case dependent. We provide guidelines below that give you a feel for how the choice of observation budget is affected by other SigOpt features. These guidelines are based on empirical experiments and years of providing guidance to customers. However, these should be viewed as a starting point from which you can deviate as you become more comfortable with SigOpt. More importantly, if the guidelines below lead you to setting an observation budget of 100 but you only have enough resources or time for 50, then do 50.
Let’s define a simple SigOpt experiment as follows:
Assume that you want to set up a simple SigOpt experiment with D dimensions. We would recommend setting the budget to a number between 10D and 20D and adjusting up or down in future experiments of the same problem as you see fit.
Example: If you have a 10-dimensional hyperparameter space where all parameters are continuous (real-valued numbers/floats) or integer-valued, we recommend creating your experiment with a budget of at least 100 (10 x #parameters). This can be adjusted in future experiments when the modeler has built up an intuition about roughly how many iterations SigOpt will take to find a sufficiently high-performing hyperparameter configuration for the given problem.
Because elements of categorical parameter spaces lack cardinal and ordinal relationships, the combinatorial nature of the problem leads to a recommendation of a larger budget. To better understand how to set the budget in this context, we first need to define the categorical breadth of an experiment. The categorical breadth of your experiment is the total number of categorical values you defined across all your categorical parameters. For example, if your experiment has two categorical parameters, a first one with 4 possible values and a second one with 5 possible values, your experiment categorical breadth is 9.
Example: If you start from the baseline experiment and substitute two of the baseline parameters with categorical parameters that have a combined categorical breadth of 9, we recommend you add an additional 135 runs to your budget (15 x categorical breadth) for a total of 235 observations.
When you run a multimetric experiment, SigOpt generates an efficient Pareto frontier that trades off between the two metrics. Effectively resolving this frontier requires the algorithm underlying the optimizer to do more function evaluations than it does to optimize over one of the functions by itself.
Example: If you start from the baseline and add a second metric over which you wish to optimize via SigOpt’s multimetric feature, we then recommend that you increase your budget by at least 3x to 300 runs.
Example: If you choose to find two solutions and the rest of the experiment is defined as in the baseline, we recommend increasing the budget to 150. If you choose to find three solutions and the rest of the experiment is defined as in the baseline, we recommend increasing the budget to 200. The formula we use to arrive at these budgets is (1 + #solutions) / 2, where #solutions is the number of solutions you selected when creating the experiment.
Using conditionals parameters is of interest in the case where different branches of the conditional have different corresponding hyperparameters. A good example would be the various Stochastic Gradient Descent optimizer variants available in most deep learning frameworks, each of which may take a different set of hyperparameters. Effectively resolving a conditional experiment requires an increase in budget because the optimizer is now effectively exploring a hyperparameter space where the dimensions along which the conditional decision takes place are disjoint, and each branch may interact somewhat differently with other parameters.
Example: If you choose to update your baseline experiment and add a condition with two branches, we recommend increasing the budget to 200 (2x #conditions).
- If your goal is to use the extra compute to achieve maximum gains, multiply the baseline budget by the number of workers.
- Example: if you parallelize a baseline of 10 workers, set a budget of 1000 (budget * # num_workers).
- If your goal is to use the extra compute to get the same result in less wall_clock time, use the following formula (budget * (1 + log2(num_workers)/2))).
- Example: for that goal, still parallelizing the baseline against 10 workers, the recommended budget becomes 332 (i.e. 100 * (1 + log2(10/2))).
Example 1: If you choose to set a metric threshold and the rest of the experiment is defined as in the baseline, we recommend increasing the budget to 125 observations (1.25 x #metric thresholds x #runs). If the thresholds are set too tight, then more runs are needed.
Example 2: If you choose to set three metric constraints and the rest of the experiment is defined as in the baseline, we recommend increasing the budget to 175 runs(1.25 x #metric constraints x #runs).
Users often have prior beliefs on how metric values might behave for certain parameters; this knowledge could be derived from domain expertise, similar models trained on different datasets, or certain known structure of the problem itself. SigOpt can take advantage of these prior beliefs on parameters to make the optimization process more efficient.