Links

Parallelism

Running an Experiment on several machines at once is easy and natural with the SigOpt API. Before you start running experiments in parallel, make sure you know how to create and run an experiment.

Create an Experiment

Create your experiment on a controller machine; you only need to perform this step once. Be sure to make a note of your experiment's id because you'll need it in subsequent steps.
Then set the parallel_bandwidth field as the number of parallel workers you are using in your experiment. This field ensures SigOpt can factor your level of parallelism into account and provide better suggestions.

Initialize the Workers

Initialize each of your workers with the EXPERIMENT_ID from the experiment that you just created. All workers, whether individual threads or machines, will receive the same experiment ID.

Run SigOpt Optimization Experiments in Parallel

Now, start the optimization loop on each worker machine. Workers will individually communicate with SigOpt's API, creating suggestions, and evaluating your metric.

Why This Works

A large benefit of SigOpt's parallelization is that each worker communicates asynchronously with the SigOpt API, so you do not need to worry about task management.
SigOpt acts as a distributed scheduler for your SigOpt suggestions, ensuring that each worker machine receives parameter assignments at the moment it asks for a new parameter configuration. SigOpt tracks which Suggestions are currently open, so machines independently creating Suggestions will not receive duplicates.

Example Setups

In these examples each machine needs to be configured with API Token.

Core Module

These code snippets provide an example combine suggested controller/worker division of labor, as well as incorporating metadata to track which machines have reported Observations.

Controller: Create Experiment, Spin Up Workers

from sigopt import Connection
def controller(api_token, num_workers=1):
# Create the SigOpt connection
conn = Connection(client_token=api_token)
# Create the experiment on master
experiment = conn.experiments().create(
name="Classifier Accuracy",
parameters=[
{
'bounds': {
'max': 1.0,
'min': 0.001
},
'name': 'gamma',
'type': 'double'
}
],
metrics=[dict(name='Accuracy', objective='maximize')],
observation_budget=20,
parallel_bandwidth=num_workers,
)
for _ in range(num_workers):
# Launch a worker and run the run_worker
# function (below) on the worker machine
# You implement this function
run_worker(
api_token=api_token,
experiment_id=experiment.id,
)

Worker: Run Optimization Loop with Metadata

import socket
from sigopt import Connection
# Each worker runs the same optimization loop
# for the experiment created on master
def run_worker(api_token, experiment_id):
# Create the SigOpt connection
conn = Connection(client_token=args.api_token)
# Keep track of the hostname for logging purposes
hostname = socket.gethostname()
experiment = conn.experiments(experiment_id).fetch()
while experiment.progress.observation_count < experiment.observation_budget:
# Receive a Suggestion
suggestion = conn.experiments(experiment.id).suggestions().create()
# Evaluate Your Metric
# You implement this function
value = evaluate_metric(suggestion.assignments)
# Report an Observation
# Include the hostname so that you can track
# progress on the web interface
conn.experiments(experiment.id).observations().create(
suggestion=suggestion.id,
value=value,
metadata=dict(hostname=hostname),
)

Recovering Open Suggestions

In the event that one or more of your machines fail, you may have a Suggestion or two in an open state. You can list open Suggestions and continue to work on them:
suggestions = conn.experiments(experiment_id).suggestions().fetch(state="open")
for suggestion in suggestions.iterate_pages():
value = evaluate_metric(suggestion.assignments) # implement this
conn.experiments(experiment_id).observations().create(
suggestion=suggestion.id,
value=value,
)
Or you can simply delete open Suggestions:
conn.experiments(experiment_id).suggestions().delete(state="open")

AI Module

Scenario 1:

The user has a code repository, a local computer (ex. Macbook) and a group of remote machines with copies of the code repository.
On your local computer, create an experiment.yml file with the following contents:
name: sigopt parallel example
parameters:
- name: hidden_layer_size
type: int
bounds:
- name: activation_function
min: 32
max: 512
type: categorical
categorical_values: ['relu', 'tanh']
metrics:
- name: holdout_accuracy
strategy: optimize
objective: maximize
threshold: 0.1
parallel_bandwidth: 2
budget: 30
Create an Experiment using the CLI command:
$ sigopt create experiment
Remotely connect to each of the remote machines (ex. via ssh) and start parallel workers with the CLI command:
$ sigopt start-worker 1234 python run-model.py

Scenario 2:

The user has a code repository, a coordination host (ex. local or remote machine) and a group of remote machines with copies of the code repository.
On the coordination host, create an Experiment:
import sigopt
experiment = sigopt.create_experiment(
name="sigopt parallel example",
parameters=[
dict(name="hidden_layer_size", type="int", bounds=dict(min=32, max=512)),
dict(name="activation_fn", type="categorical", categorical_values=["relu", "tanh"]),
],
metrics=[
dict(name="holdout_accuracy", strategy="optimize", objective="maximize"),
dict(name="inference_time", strategy="constraint", objective="minimize", threshold=0.1),
],
project="sigopt-examples",
parallel_bandwidth=5,
budget=30,
)
Start parallel workers:
for machine_number in range(experiment.parallel_bandwidth):
run_command_on_machine(
machine_number,
f"sigopt start-worker {experiment.id} python run-model.py",
)