Parallelism
Running an Experiment on several machines at once is easy and natural with the SigOpt API. Before you start running experiments in parallel, make sure you know how to create and run an experiment.
Create your experiment on a controller machine; you only need to perform this step once. Be sure to make a note of your experiment's id because you'll need it in subsequent steps.
Then set the
parallel_bandwidth
field as the number of parallel workers you are using in your experiment. This field ensures SigOpt can factor your level of parallelism into account and provide better suggestions.Initialize each of your workers with the
EXPERIMENT_ID
from the experiment that you just created. All workers, whether individual threads or machines, will receive the same experiment ID.Now, start the optimization loop on each worker machine. Workers will individually communicate with SigOpt's API, creating suggestions, and evaluating your metric.
A large benefit of SigOpt's parallelization is that each worker communicates asynchronously with the SigOpt API, so you do not need to worry about task management.
SigOpt acts as a distributed scheduler for your SigOpt suggestions, ensuring that each worker machine receives parameter assignments at the moment it asks for a new parameter configuration. SigOpt tracks which Suggestions are currently
open
, so machines independently creating Suggestions will not receive duplicates.In these examples each machine needs to be configured with API Token.
These code snippets provide an example combine suggested controller/worker division of labor, as well as incorporating metadata to track which machines have reported Observations.
from sigopt import Connection
def controller(api_token, num_workers=1):
# Create the SigOpt connection
conn = Connection(client_token=api_token)
# Create the experiment on master
experiment = conn.experiments().create(
name="Classifier Accuracy",
parameters=[
{
'bounds': {
'max': 1.0,
'min': 0.001
},
'name': 'gamma',
'type': 'double'
}
],
metrics=[dict(name='Accuracy', objective='maximize')],
observation_budget=20,
parallel_bandwidth=num_workers,
)
for _ in range(num_workers):
# Launch a worker and run the run_worker
# function (below) on the worker machine
# You implement this function
run_worker(
api_token=api_token,
experiment_id=experiment.id,
)
import socket
from sigopt import Connection
# Each worker runs the same optimization loop
# for the experiment created on master
def run_worker(api_token, experiment_id):
# Create the SigOpt connection
conn = Connection(client_token=args.api_token)
# Keep track of the hostname for logging purposes
hostname = socket.gethostname()
experiment = conn.experiments(experiment_id).fetch()
while experiment.progress.observation_count < experiment.observation_budget:
# Receive a Suggestion
suggestion = conn.experiments(experiment.id).suggestions().create()
# Evaluate Your Metric
# You implement this function
value = evaluate_metric(suggestion.assignments)
# Report an Observation
# Include the hostname so that you can track
# progress on the web interface
conn.experiments(experiment.id).observations().create(
suggestion=suggestion.id,
value=value,
metadata=dict(hostname=hostname),
)
In the event that one or more of your machines fail, you may have a Suggestion or two in an
open
state
. You can list open Suggestions and continue to work on them:suggestions = conn.experiments(experiment_id).suggestions().fetch(state="open")
for suggestion in suggestions.iterate_pages():
value = evaluate_metric(suggestion.assignments) # implement this
conn.experiments(experiment_id).observations().create(
suggestion=suggestion.id,
value=value,
)
conn.experiments(experiment_id).suggestions().delete(state="open")
The user has a code repository, a local computer (ex. Macbook) and a group of remote machines with copies of the code repository.
On your local computer, create an
experiment.yml
file with the following contents:name: sigopt parallel example
parameters:
- name: hidden_layer_size
type: int
bounds:
- name: activation_function
min: 32
max: 512
type: categorical
categorical_values: ['relu', 'tanh']
metrics:
- name: holdout_accuracy
strategy: optimize
objective: maximize
threshold: 0.1
parallel_bandwidth: 2
budget: 30
Create an Experiment using the CLI command:
$ sigopt create experiment
Remotely connect to each of the remote machines (ex. via ssh) and start parallel workers with the CLI command:
$ sigopt start-worker 1234 python run-model.py
The user has a code repository, a coordination host (ex. local or remote machine) and a group of remote machines with copies of the code repository.
On the coordination host, create an Experiment:
import sigopt
experiment = sigopt.create_experiment(
name="sigopt parallel example",
parameters=[
dict(name="hidden_layer_size", type="int", bounds=dict(min=32, max=512)),
dict(name="activation_fn", type="categorical", categorical_values=["relu", "tanh"]),
],
metrics=[
dict(name="holdout_accuracy", strategy="optimize", objective="maximize"),
dict(name="inference_time", strategy="constraint", objective="minimize", threshold=0.1),
],
parallel_bandwidth=5,
budget=30,
)
Start parallel workers:
for machine_number in range(experiment.parallel_bandwidth):
run_command_on_machine(
machine_number,
f"sigopt start-worker {experiment.id} python run-model.py",
)
Last modified 9mo ago