CLI Reference
SigOpt is a command-line tool for managing training clusters and running optimization experiments.

Cluster Configuration File

The cluster configuration file is commonly referred to as cluster.yml, but you can name yours anything you like. The file is used when we create a SigOpt cluster, with sigopt cluster create -f cluster.yml. You can update your cluster configuration file after the cluster has been created to change the number of nodes in your cluster or change instance types. These changes can be applied by running sigopt cluster update -f cluster.yml. Some updates might not be supported, for example introducing GPU nodes to your cluster in some regions. If the update is not supported then you will need to destroy the cluster and create it again.
The available fields are:
Field
Required?
Description
cpu, or gpu
Yes
You must provide at least one of either cpu or gpu. Define the CPU compute that your cluster will need in terms of: instance_type, max_nodes, and min_nodes. It is recommended that you set min_nodes to 0 so the autoscaler can remove all of your expensive compute nodes when they aren't in use. It's ok if max_nodes and min_nodes are the same value, as long as max_nodes is not 0.
cluster_name
Yes
You must provide a name for your cluster. You will share this with anyone else who wants to connect to your cluster.
aws
No
Override environment-provided values for aws_access_key_id or aws_secret_access_key.
kubernetes_version
No
The version of Kubernetes to use for your cluster. Currently supports Kubernetes 1.16, 1.17, 1.18, and 1.19. Defaults to the latest stable version supported by SigOpt, which is currently 1.18.
provider
No
Currently, AWS is our only supported provider for creating clusters. You can, however, use a custom provider to connect to your own Kubernetes cluster with the sigopt cluster connect. See page on Bringing your own K8s cluster.
system
No
System nodes are required to run the autoscaler. You can specify the number and type of system nodes with min_nodes, max_nodes and instance_type. The value of min_nodes must be at least 1 so that you have at least 1 system node. The defaults for system are:
+ min_nodes: 1
+ max_nodes: 2
+ instance_type: "t3.large"

Example

The example YAML file below defines a CPU cluster named tiny-cluster with two t2.small AWS instances.
1
# cluster.yml
2
3
# AWS is currently our only supported provider for cluster create
4
# You can connect to custom clusters via `sigopt connect`
5
provider: aws
6
7
# We have provided a name that is short and descriptive
8
cluster_name: tiny-cluster
9
10
# Your cluster config can have CPU nodes, GPU nodes, or both.
11
# The configuration of your nodes is defined in the sections below.
12
13
# (Optional) Define CPU compute here
14
cpu:
15
# AWS instance type
16
instance_type: t2.small
17
max_nodes: 2
18
min_nodes: 0
19
20
# # (Optional) Define GPU compute here
21
# gpu:
22
# # AWS GPU-enabled instance type
23
# # This can be any p* instance type
24
# instance_type: p2.xlarge
25
# max_nodes: 2
26
# min_nodes: 0
27
28
kubernetes_version: '1.20'
Copied!

Configure training orchestration

The SigOpt configuration file tells SigOpt how to setup and run the model, which metrics to track, as well as details about which hyperparameters to tune.
You can use a SigOpt Run config YAML file you've already created, or SigOpt will auto-generate run.yml and cluster.yml template files for you if you run the following:
1
$ sigopt init
Copied!
The available fields for run.yml are:
Field
Required?
Description
image
Yes
Name of Docker container SigOpt creates for you. You can also point this to an existing Docker container to use for SigOpt.
name
Yes
Name for your run
aws
No
AWS access credentials to use with the Run. Will be used to access S3 during model execution
resources
No
Resources to allocate to each Run. Can specify limits and requests for cpu, memory, ephemeral-storage and can specify GPUs.
run
No
Model file to execute
To orchestrate an optimization Experiment, you will also need to specify an experiment.yml file.
The available fields are:
Field
Required?
Description
budget
Yes
Number of Runs for a SigOpt Experiment
metrics
Yes
Evaluation and storage metrics for a SigOpt Experiment
name
Yes
Name for your experiment
parameters
Yes
Parameters and ranges specified for a SigOpt Experiment
type
Yes
Type of Experiment to execute:
+ offline — for SigOpt Optimization and All Constraint Experiments
+ random — for Random Search
+
+ grid — for Grid Search
parallel_bandwidth
No
Number of workers

Considerations for resources

When specifying CPUs, valid amounts are whole numbers (1, 2), and fractional numbers or millis (1.5 and 1500m both represent 1.5 CPU). When specifying memory, valid amounts are shown in the Kubernetes documentation for memory resources, but some examples are 1e9, 1Gi, 500M. For GPUs, only whole numbers are valid.
When choosing the resources for a single model training run, it's important to keep in mind that some resources on your cluster will be auto-reserved for Kubernetes processes. For this reason, you must specify fewer resources for your model than are available on each node. A good rule of thumb is to assume that your node will have 0.5 CPU less than the total to run your model.
For example, if your nodes have 8 CPUs then you must specify fewer than 8 CPUs in the requests section of your resources in order for your model to run. Keep in mind that you can specify fractional amounts of CPU, e.g. 7.5 or 7500m.

Example

Here's an example of SigOpt Run and Experiment YAML files:
1
# run.yml
2
name: My Run
3
run: python mymodel.py
4
resources:
5
requests:
6
cpu: 0.5
7
memory: 512Mi
8
limits:
9
cpu: 2
10
memory: 4Gi
11
gpus: 1
12
image: my-run
Copied!
1
# experiment.yml
2
name: SGD Classifier HPO
3
4
metrics:
5
- name: accuracy
6
parameters:
7
- name: l1_ratio
8
type: double
9
bounds:
10
min: 0
11
max: 1.0
12
- name: log_alpha
13
type: double
14
bounds:
15
min: -5
16
max: 2
17
18
parallel_bandwidth: 2
19
budget: 60
Copied!

SigOpt Commands

The best way to learn the most up to date information about cluster commands is from the command line interface (CLI) itself! Append any command with --help to learn about sub commands, arguments, and flags.
For example, to learn more about all SigOpt commands, run:
1
$ sigopt --help
Copied!
To learn more about the specific sigopt cluster optimize command, run:
1
$ sigopt cluster optimize --help
Copied!
For a cheat sheet of all SigOpt CLI commands go to our API Reference.

Adding AWS Policies

Users creating AWS clusters with SigOpt can easily interface with different AWS services. To allow your cluster permission to access different AWS services, provide additional AWS policies in the aws.additional_policies section of the cluster configuration file.
1
cluster_name: cluster-with-s3-access
2
provider: aws
3
cpu:
4
instance_type: t2.small
5
min_nodes: 0
6
max_nodes: 2
7
aws:
8
additional_policies:
9
- arn:aws:iam::aws:policy/AmazonS3ReadOnlyAccess
Copied!

SigOpt Logging

SigOpt integrates seamlessly with the SigOpt API to optimize the hyperparameters of your model. SigOpt is built to handle communication with theSigOpt API under the hood, so that you only need to focus on your model, some lightweight installation requirements, and your experiment configuration file.
As you write your model, use a few lines of code from the sigopt package to read hyperparameters and write your model's metric(s).

Logging Example

Below is a comparison of two nearly-identical Multilayer Perceptron models. The first example does not use SigOpt; the second example does use SigOpt. As you can see, the model with SigOpt uses sigopt.get_parameter to read assignments from SigOpt, as well as sigopt.log_metric to send its metric value back to SigOpt.
Without SigOpt
With SigOpt
1
import numpy
2
import keras
3
from keras.models import Sequential
4
from keras.layers import Dense, Dropout, Activation
5
from keras.optimizers import SGD
6
7
8
x_train = numpy.random.random((1000, 20))
9
y_train = keras.utils.to_categorical(
10
numpy.random.randint(10, size=(1000, 1)),
11
num_classes=10,
12
)
13
x_test = numpy.random.random((100, 20))
14
y_test = keras.utils.to_categorical(
15
numpy.random.randint(10, size=(100, 1)),
16
num_classes=10,
17
)
18
19
dropout_rate = 0.5
20
model = Sequential()
21
model.add(Dense(
22
units=64,
23
activation='relu',
24
input_dim=20,
25
))
26
model.add(Dropout(dropout_rate))
27
model.add(Dense(
28
units=64,
29
activation='relu',
30
))
31
model.add(Dropout(dropout_rate))
32
model.add(Dense(10, activation='softmax'))
33
34
sgd = SGD(
35
lr=0.01,
36
decay=1e-6,
37
momentum=0.9,
38
nesterov=True,
39
)
40
model.compile(
41
loss='categorical_crossentropy',
42
optimizer=sgd,
43
metrics=['accuracy'],
44
)
45
46
model.fit(
47
x=x_train,
48
y=y_train,
49
epochs=20,
50
batch_size=128,
51
)
52
evaluation_loss, accuracy = model.evaluate(
53
x=x_test,
54
y=y_test,
55
batch_size=128,
56
)
57
print('evaluation_loss:', evaluation_loss)
58
print('accuracy:', accuracy)
Copied!
1
import numpy
2
from numpy import log
3
import keras
4
from keras.models import Sequential
5
from keras.layers import Dense, Dropout, Activation
6
from keras.optimizers import SGD
7
import sigopt
8
9
x_train = numpy.random.random((1000, 20))
10
y_train = keras.utils.to_categorical(
11
numpy.random.randint(10, size=(1000, 1)),
12
num_classes=10,
13
)
14
x_test = numpy.random.random((100, 20))
15
y_test = keras.utils.to_categorical(
16
numpy.random.randint(10, size=(100, 1)),
17
num_classes=10,
18
)
19
sigopt.params.setdefaults(
20
dropout_rate=0.5,
21
hidden_1=64,
22
activation_1="relu",
23
hidden_2=64,
24
activation_2="relu",
25
log_lr=log(0.01),
26
log_decay=-6,
27
momentum=0.9,
28
batch_size=128,
29
)
30
model = Sequential()
31
model.add(
32
Dense(
33
units=sigopt.params.hidden_1,
34
activation=sigopt.params.activation_1,
35
input_dim=20,
36
)
37
)
38
model.add(Dropout(sigopt.params.dropout_rate))
39
model.add(
40
Dense(
41
units=sigopt.params.hidden_2,
42
activation=sigopt.params.activation_2,
43
)
44
)
45
model.add(Dropout(dropout_rate))
46
model.add(Dense(10, activation="softmax"))
47
sgd = SGD(
48
lr=10 ** sigopt.params.log_lr,
49
decay=10 ** sigopt.params.log_decay,
50
momentum=sigopt.params.momentum,
51
nesterov=True,
52
)
53
model.compile(
54
loss="categorical_crossentropy",
55
optimizer=sgd,
56
metrics=["accuracy"],
57
)
58
model.fit(
59
x=x_train,
60
y=y_train,
61
epochs=20,
62
batch_size=128,
63
)
64
evaluation_loss, accuracy = model.evaluate(
65
x=x_test,
66
y=y_test,
67
batch_size=sigopt.params.batch_size,
68
)
69
sigopt.log_metric("evaluation_loss", evaluation_loss)
70
sigopt.log_metric("accuracy", accuracy)
Copied!

SigOpt Compute Resources

If you're training a model that needs a GPU you will want to use resources to ensure that your model has access to GPUs. Requests and limits are optional, but may be helpful if your model is having trouble running with enough memory or CPU resources.
Requests are resource guarantees and will cause your model to wait until the cluster has available resources before running. Limits prevent your model from using additional resources. These map directly to Kubernetes requests and limits.
Note: If you only set a limit it will also set a request of the same value. See the Kubernetes documentation for details.

Resource Types

  • CPU resources are measured in number of "logical cores" and can be decimal values. This is generally a vCPU in the cloud and a hyperthread on a custom cluster. See Meaning of CPU on the Kubernetes documentation for cloud specific and formatting details.
  • Memory is measured in number of bytes but can be postfixed by "Mi, Gi" for megabytes and gigabytes respectively. See Meaning of Memory on the Kubernetes documentation for details and below for a simple example.
  • The gpus field is currently specific to Nvidia gpus tagged as "nvidia.com/gpu". Alternatives can used by adding them to the limits field.
The below example will guarantee 20%(.2) of a logical core, 200 megabytes of memory, and a gpu are available for your model to run. If the cluster you are running on does not have enough free compute resources it will wait until they become available before running your model. This example will also limit your model so that it does not use more than 2 logical cores and 2 gigabytes of memory.
1
name: My Experiment
2
run: python model.py
3
image: example/foobar
4
resources:
5
requests:
6
cpu: 0.5
7
memory: 512Mi
8
limits:
9
cpu: 2
10
memory: 2Gi
Copied!

Docker

Orchestrate uses Docker to build and upload your model environment. If you find that sigopt cluster optimize is taking a long time, then you may want to try some of the following tips to reduce the build and upload time of your model:

Keep your model directory free of extra files

Omit files like logs, saved models, tests, and virtual environments. Changes to these extra files will cause SigOpt to re-build your model environment.

Omit your training data from your model directory

You can try downloading or streaming your training data in your run commands instead.

Create a .dockerignore file in your model directory

This file should contain a list of the files that you want to omit from your model environment.
1
# python bytecode
2
**/*.pyc
3
**/__pycache__/
4
5
# virtual environment
6
venv/
7
8
# training data
9
data/
10
11
# tests
12
tests/
13
14
# anything else
15
.git/
16
saved_models/
17
logs/
Copied!
See the official Docker documentation for more information.

Custom Image Registries

Clusters with the provider aws will use AWS ECR as their default container registry, and cluster with the provider custom will use Docker Hub.
To use a custom image registry, provide the registry argument when you connect to your cluster:
1
$ sigopt cluster connect \
2
--cluster-name tiny-cluster \
3
--provider custom \
4
--kubeconfig /path/to/kubeconfig \
5
--registry myregistrydomain:port
Copied!