Using Early Stopping

How to use early stopping in Katib experiments

This guide shows how you can use early stopping to improve your Katib experiments. Early stopping allows you to avoid overfitting when you train your model during Katib experiments. It also helps by saving computing resources and reducing experiment execution time by stopping the experiment’s trials when the target metric(s) no longer improves before the training process is complete.

The major advantage of using early stopping in Katib is that you don’t need to modify your training container package. All you have to do is make necessary changes in your experiment’s YAML file.

Early stopping works in the same way as Katib’s metrics collector. It analyses required metrics from the stdout or from the arbitrary output file and an early stopping algorithm makes the decision if the trial needs to be stopped. Currently, early stopping works only with StdOut or File metrics collectors.

Note: Your training container must print training logs with the timestamp, because early stopping algorithms need to know the sequence of reported metrics. Check the MXNet example to learn how to add a date format to your logs.

Configure the experiment with early stopping

As a reference, you can use the YAML file of the early stopping example.

  1. Follow the guide to configure your Katib experiment.

  2. Next, to apply early stopping for your experiment, specify the .spec.earlyStopping parameter, similar to the .spec.algorithm. Refer to the EarlyStoppingSpec type for more information.

    • .earlyStopping.algorithmName - the name of the early stopping algorithm.

    • .earlyStopping.algorithmSettings- the settings for the early stopping algorithm.

What happens is your experiment’s suggestion produces new trials. After that, the early stopping algorithm generates early stopping rules for the created trials. Once the trial reaches all the rules, it is stopped and the trial status is changed to the EarlyStopped. Then, Katib calls the suggestion again to ask for the new trials.

Learn more about Katib concepts in the overview guide.

Follow the Katib configuration guide to specify your own image for the early stopping algorithm.

Early stopping algorithms in detail

Here’s a list of the early stopping algorithms available in Katib:

More algorithms are under development.

You can add an early stopping algorithm to Katib yourself. Check the developer guide to contribute.

Median Stopping Rule

The early stopping algorithm name in Katib is medianstop.

The median stopping rule stops a pending trial X at step S if the trial’s best objective value by step S is worse than the median value of the running averages of all completed trials' objectives reported up to step S.

To learn more about it, check Google Vizier: A Service for Black-Box Optimization.

Katib supports the following early stopping settings:

Setting Name Description Default Value
min_trials_required Minimal number of successful trials to compute median value 3
start_step Number of reported intermediate results before stopping the trial 4

Submit an early stopping experiment from the UI

You can use Katib UI to submit an early stopping experiment. Follow these steps to create an experiment from the UI.

Once you reach the early stopping section, select the appropriate values:

UI form to deploy an early stopping Katib experiment

View the early stopping experiment results

First, make sure you have jq installed.

Check the early stopped trials in your experiment:

kubectl get experiment <experiment-name>  -n <experiment-namespace> -o json | jq -r ".status"

The last part of the above command output looks similar to this:

  . . .
  "earlyStoppedTrialList": [
    "median-stop-2ml8h96d",
    "median-stop-cgjkq8zn",
    "median-stop-pvn5p54p",
    "median-stop-sjc9tcgc"
  ],
  "startTime": "2020-11-05T03:03:43Z",
  "succeededTrialList": [
    "median-stop-2kmh57qf",
    "median-stop-7ccstz4z",
    "median-stop-7sqt7556",
    "median-stop-lgvhfch2",
    "median-stop-mkfjtwbj",
    "median-stop-nfmgqd7w",
    "median-stop-nsbxw5m9",
    "median-stop-nsmhg4p2",
    "median-stop-rp88xflk",
    "median-stop-xl7dlf5n",
    "median-stop-ztc58kwq"
  ],
  "trials": 15,
  "trialsEarlyStopped": 4,
  "trialsSucceeded": 11
}

Check the status of the early stopped trial by running this command:

kubectl get trial median-stop-2ml8h96d -n <experiment-namespace>

and you should be able to view EarlyStopped status for the trial:

NAME                   TYPE           STATUS   AGE
median-stop-2ml8h96d   EarlyStopped   True     15m

In addition, you can check your results on the Katib UI. The trial statuses on the experiment monitor page should look as follows:

UI form to view trials

You can click on the early stopped trial name to get reported metrics before this trial is early stopped:

UI form to view trial info

Next steps