Multitask Experiments
Multitask Experiments are only supported in the Core Module.
SigOpt supports multitask experiments through the API. Multitask experiments are useful in situations where faster approximations to the true metric under consideration are available. We refer to these approximations as tasks
, with the full cost task associated with the true metric that SigOpt is trying to maximize. The goal of this experiment type is to leverage data which is available at less cost to more efficiently acquire and interpret data regarding the true objective.
The chosen task names (along with an estimated cost) should be provided at experiment creation. Subsequent observations will contain both a set of parameter assignments and the appropriate task to consider. If the faster tasks provide useful insights regarding the true metric, SigOpt should be able to search the parameter space in less wall-clock time.
Example - Partial Number of Epochs
One example where faster tasks naturally arise is in maximizing the accuracy resulting from a gradient descent algorithm. If we hope to conduct at most 10 epochs, one might choose to define four tasks: a cheapest
task consisting of 1 epochs at cost 0.1
, a cheaper
task consisting of 2 epochs at cost 0.2
, a cheap
task consisting of 5 epochs at cost 0.5
, and the true
task consisting of all 10 epochs at cost 1.0
.
Example - Subset of Data
Another example of how faster tasks can be defined in a supervised machine learning setting is by training on subsets of the data. Suppose 10000 labeled examples are available, and the goal of the SigOpt experiment is to maximize the training accuracy. One might choose to define two tasks: a true
task which involves building a model on all the data at cost 1.0
and a cheap
task which involves a balanced subsampling of only 1000 labeled examples at cost 0.1
.
Creating the Experiment
A multitask experiment must have its type
set to offline
and must have a field tasks
set to a positive integer. tasks
should contain a list of objects stating the name
and cost
associated with each task (including the full cost task for which SigOpt is trying to find the optimum). Multitask experiments also require the observation_budget
to be set; in this setting, observation_budget
represents the cumulative cost to be expended throughout the entire course of the experiment (discussed further below).
Interpreting Suggestions and Reporting Observations
Suggestions in multitask experiments have one new field: task
contains the name
and cost
of the task for which to execute the given assignments. Observations must be reported using the suggestion id
provided from SigOpt; manually reporting data without a corresponding suggestion is not presently supported.
SigOpt will suggest many partial cost suggestions, especially in the beginning of the experiment, but also guarantees at least one full cost observation (often multiple) by the end of the experiment (once the observation_budget
is expended).
Reviewing Reported Observations
Best Assignments
Setting Observation Budget
Traditionally, observation_budget
would correspond to the expected number of observations created during an experiment. For a multitask experiment, the observation_budget
represents the expected cost accumulated over all observations created. You should set observation_budget
based on how much time or compute you would like to allocate to your experiment. If you were previously using standard SigOpt (non-multitask) to tune a model, you may be able to decrease observation_budget
when running multitask experiments.
When defining an observation_budget
for a multitask experiment consider that all of the following circumstances would yield cumulative cost 100:
90 observations with cost 1.0 and 40 with cost 0.25
50 observations with cost 1.0 and 200 with cost 0.25
10 observations with cost 1.0 and 360 with cost 0.25
SigOpt will balance exploration and exploitation over the course of the experiment to determine which of these circumstances is best for the given problem. We will not, however, know a priori, so the observation_budget
should most appropriately be set as a rough guideline of the total cost with which SigOpt can work. As is the case with all SigOpt experiments, the budget is meant only as guidance to SigOpt and not as a promise.
Observation Budget Consumed
Limitations
Multitask experiments have some limitations in order to allow their complicated functionality.
Suggestions cannot be enqueued, updated or deleted.
Observations must be created with a
suggestion
field, i.e., without anassignments
field.Only one metric is permitted and the number of solutions must be one.
Despite the fractional nature of the cumulative cost, the
observation_budget
is still required to be an integer.Tasks must all have unique names and costs.
The true task under consideration in the experiment must have cost 1.0, and all other costs should be less than that.
Recommendations
During development of this feature, we have produced some guidance regarding the successful use of multiple tasks to accelerate optimization of the true, full cost, task.
Limiting the number of tasks can help SigOpt develop a more consistent understanding of the relationship between less expensive and more expensive tasks. As a result, we recommend using less than 5 tasks in initial experimenting to gauge the benefit of more on your experimental setting.
Limiting the gap in cost between the cheapest and true tasks can help SigOpt understand the relationship across tasks. In practice, we have seen that setting the cheapest task to a cost no less than 0.03 can improve the effectiveness of balancing of sampling cheaper and more expensive tasks.
Last updated