Learning to learn second-order back-propagation for CNNs using LSTMs

January 2018

Abstract

Convolutional neural networks (CNNs) typically suffer from slow convergence rates in training, which limits their wider application. This paper presents a new CNN learning approach, based on second-order methods, aimed at improving - a) Convergence rates of existing gradient-based methods, and b) Robustness to the choice of learning hyper-parameters (e.g., learning rate). We derive an efficient back-propagation algorithm for simultaneously computing both gradients and second derivatives of the CNN’s learning objective. These are then input to a Long Short Term Memory (LSTM) to predict optimal updates of CNN parameters in each learning iteration. Both meta-learning of the LSTM and learning of the CNN are conducted jointly. Evaluation on image classification demonstrates that our second-order backpropagation has faster convergences rates than standard gradientbased learning for the same CNN, and that it converges to better optima leading to better performance under a budgeted time for learning. We also show that an LSTM learned to learn a small CNN network can be readily used for learning a larger network

Type

Conference paper

Publication

In IEEE International Conference on Pattern Recognition

Learning to learn, Second-order methods

Anirban Roy

Senior Computer Scientist

Anirban Roy is a Senior Computer Scientist at SRI International. His current interests include Generative models, assured machine learning, AI for creativity and design, AI for education. In recent past, he has worked on activity recognition, object recognition, multi-object tracking. He has lead/involved on multiple government and commercial projects with clients including DARPA, IARPA, NSF and ARL.