199. Simple SGD vs Cyclic Learning Rate

Simple SGD vs Cyclic Learning Rate

I compared the training speed between two optimizers by training a UNet Model.

  • Simple SGD
    optimizer = torch.optim.SGD(model.parameters(), lr=0.01)
    
  • Cyclic Learning Rate
    optimizer = torch.optim.SGD(model.parameters(), lr=0.01)
    scheduler = torch.optim.lr_scheduler.CyclicLR(optimizer, base_lr=0.1, max_lr=1e-4)
    

Cyclic Learning Rate was able to achieve lower loss and got to that level faster.