197. Cyclical Learning Rate

Cyclical Learning Rate

It is known that the learning rate is one of the most important hyper-parameters for training deep neural networks. Unlike previous methods such as monotonically decreasing the learning rate, the Cyclical learning rate practically eliminates the need to experimentally find the best value by cyclically varying the learning rate within a reasonable boundary. All you have to do is set the minimum and maximum values for that boundary.

Finding the appropriate boundaries

How do we estimate reasonable boundary values? This paper does that by running an “LR range test”; run your model and record accuracy for several epochs while letting the learning rate increase linearly between low and high learning rates. (This test is enormously valuable whenever you are facing a new architecture or dataset).
Set your minimum value where the model starts converging. Set your maximum value when the training peaks and the accuracy drops.