Learning to learn second-order back-propagation for CNNs using LSTMs

Abstract

Convolutional neural networks (CNNs) typically suffer from slow convergence rates in training, which limits their wider application. This paper presents a new CNN learning approach, based on second-order methods, aimed at improving - a) Convergence rates of existing gradient-based methods, and b) Robustness to the choice of learning hyper-parameters (e.g., learning rate). We derive an efficient back-propagation algorithm for simultaneously computing both gradients and second derivatives of the CNN’s learning objective. These are then input to a Long Short Term Memory (LSTM) to predict optimal updates of CNN parameters in each learning iteration. Both meta-learning of the LSTM and learning of the CNN are conducted jointly. Evaluation on image classification demonstrates that our second-order backpropagation has faster convergences rates than standard gradientbased learning for the same CNN, and that it converges to better optima leading to better performance under a budgeted time for learning. We also show that an LSTM learned to learn a small CNN network can be readily used for learning a larger network

Publication
In IEEE International Conference on Pattern Recognition
Anirban Roy
Anirban Roy
Senior Computer Scientist

Anirban Roy is a Senior Computer Scientist at SRI International. His current interests include Generative models, assured machine learning, AI for creativity and design, AI for education. In recent past, he has worked on activity recognition, object recognition, multi-object tracking. He has lead/involved on multiple government and commercial projects with clients including DARPA, IARPA, NSF and ARL.