Gradient methods with Energy for large-scale optimization problems

Abstract

We propose AEGD, a new algorithm for gradient-based optimization of stochastic objective functions, based on adaptive updates of quadratic energy. The method is shown to be unconditionally energy stable, irrespective of the step size. In addition, AEGD enjoys tight convergence rates, yet allows a large step size. The method is straightforward to implement and requires little tuning of hyper-parameters. Experimental results demonstrate that AEGD works well for various optimization problems: it is robust with respect to initial data, capable of making rapid initial progress, shows comparable and most times better generalization performance than SGD with momentum for deep neural networks.