How to Configure Early Stopping

Early Stopping overview for Katib Experiments

This guide shows how you can use early stopping to optimize cost for your Katib Experiments. Early stopping allows you to avoid overfitting when you train your model during Katib Experiments. It also helps by saving computing resources and reducing Experiment execution time by stopping the Experiment’s Trials when the target metric(s) no longer improves before the training process is complete.

The major advantage of using early stopping in Katib is that you don’t need to modify your training container package. All you have to do is make necessary changes to your Experiment’s YAML file.

Early stopping works in the same way as Katib’s metrics collector. It analyses required metrics from the StdOut or from the arbitrary output file and an early stopping algorithm makes the decision if the Trial needs to be stopped. Currently, early stopping works only with StdOut or File metrics collectors.

Note: Your training container must print training logs with the timestamp, because early stopping algorithms need to know the sequence of reported metrics. Check the PyTorch example to learn how to add a date format to your logs.

Configure the Experiment with early stopping

As a reference, you can use the YAML file of the early stopping example.

Follow the guide to configure your Katib Experiment.
Next, to apply early stopping for your Experiment, specify the .spec.earlyStopping parameter, similar to the .spec.algorithm.
- .earlyStopping.algorithmName - the name of the early stopping algorithm.
- .earlyStopping.algorithmSettings- the settings for the early stopping algorithm.

What happens is your Experiment’s Suggestion produces new Trials. After that, the early stopping algorithm generates early stopping rules for the created Trials. Once the Trial reaches all the rules, it is stopped and the Trial status is changed to the EarlyStopped. Then, Katib calls the Suggestion again to ask for the new Trials.

Early Stopping Algorithms

Katib currently supports several algorithms for early stopping:

Median Stopping Rule

More algorithms are under development.

Median Stopping Rule

The early stopping algorithm name in Katib is medianstop.

The median stopping rule stops a pending Trial X at step S if the Trial’s best objective value by step S is worse than the median value of the running averages of all completed Trials objectives reported up to step S.

To learn more about it, check Google Vizier: A Service for Black-Box Optimization.

Katib supports the following early stopping settings:

Setting Name	Description	Default Value
min_trials_required	Minimal number of successful Trials to compute median value	3
start_step	Number of reported intermediate results before stopping the Trial	4

Next steps

How to use Katib Experiment Trial templates(/docs/components/katib/user-guides/trial-template).
How to restart your Experiment and use the resume policies.

Feedback

Was this page helpful?

Glad to hear it! Please tell us how we can improve.

Sorry to hear that. Please tell us how we can improve.

Last modified May 8, 2024: Katib: Reorganized Katib Docs (#3723) (9903837)