cluster.yml, but you can name yours anything you like. The file is used when we create a SigOpt cluster, with
sigopt cluster create -f cluster.yml. You can update your cluster configuration file after the cluster has been created to change the number of nodes in your cluster or change instance types. These changes can be applied by running
sigopt cluster update -f cluster.yml. Some updates might not be supported, for example introducing GPU nodes to your cluster in some regions. If the update is not supported then you will need to destroy the cluster and create it again.
gpu. Define the CPU compute that your cluster will need in terms of:
min_nodes. It is recommended that you set
min_nodesto 0 so the autoscaler can remove all of your expensive compute nodes when they aren't in use. It's ok if
min_nodesare the same value, as long as
max_nodesis not 0.
sigopt cluster connect. See page on Bringing your own K8s cluster.
instance_type. The value of
min_nodesmust be at least 1 so that you have at least 1 system node. The defaults for
cluster.ymltemplate files for you if you run the following:
offline— for SigOpt Optimization and All Constraint Experiments
random— for Random Search
grid— for Grid Search
resourcesin order for your model to run. Keep in mind that you can specify fractional amounts of CPU, e.g. 7.5 or 7500m.
--helpto learn about sub commands, arguments, and flags.
sigopt cluster optimizecommand, run:
aws.additional_policiessection of the cluster configuration file.
sigoptpackage to read hyperparameters and write your model's metric(s).
Multilayer Perceptronmodels. The first example does not use SigOpt; the second example does use SigOpt. As you can see, the model with SigOpt uses
sigopt.get_parameterto read assignments from SigOpt, as well as
sigopt.log_metricto send its metric value back to SigOpt.
resourcesto ensure that your model has access to GPUs. Requests and limits are optional, but may be helpful if your model is having trouble running with enough memory or CPU resources.
sigopt cluster optimizeis taking a long time, then you may want to try some of the following tips to reduce the build and upload time of your model:
.dockerignorefile in your model directory
registryargument when you connect to your cluster: