Promoting Exploration in Memory-Augmented Adam using Critical Momenta

P. Malviya, G. Mordido, A. Baratin, R. Babanezhad Harikandeh, J. Huang, S. Lacoste-Julien, R. Pascanu, S. Chandar

January, 2023

Abstract

Adaptive gradient-based optimizers, particularly Adam, have left their mark in training large-scale deep learning models. The strength of such optimizers is that they exhibit fast convergence while being more robust to hyperparameter choice. However, they often generalize worse than non-adaptive methods.

Type

Publication

arXiv 2023

Source Themes