f1_score
and the actual average_depth
of the model. We are interested in models that achieve higher than 0.8 of f1_score
and have average_depth
lower than 10. It is also a good idea to store other metrics to inspect the models further. For example, we can keep track of each model's inference_time
on the test set.f1_score
and average_depth
capture everything about our problem, we can run a Multimetric Experiment to search for the Pareto Efficient Frontier points. The minimum-performance thresholds can (optionally) be incorporated as Metric Thresholds.optimize
strategy with the constraint
strategy.get_best_runs
method. Notice that the Multimetric experiment finds many dominant points, and an All-Constraint experiment finds more configurations that satisfy the user's constraints.f1_score
and lowest average_depth
), all models failed to achieve low inference time. All-Constraint experiments recognize that other goals may exist, and they search for a diverse range of outcomes to service future demands. Specifically, note that:inference_time
. Notice that only models with low num_boost_round
remain active.num_boost_round
yields high F1 score -- this is not surprising, but our Multimetric experiment learns this and then spends its energy exploiting that information to make a better Pareto frontier.num_boost_round
. That is critical for producing models with good performance and faster inference time.eta
(learning rate) values between [-1.5, 0].max_depth
than Multimetric, especially between values 5 and 15.min_child_weight
values seems to produce acceptable results -- the metrics seem unaffected by this parameter alone. However, for satisfactory models, it looks like max_depth
and min_child_weight
are inversely correlated.budget
must be set when a All-Constraint experiment is created.