Deep neural networks (DNNs) have been shown to be brittle to inputs outside the distribution of training data, and to adversarial examples. This fragility is compounded by a lack of effectively computable measures of prediction confidence that correlate with the accuracy of DNNs. The direct use of logits severely overestimates confidence. These factors have impeded the adoption of DNNs in high-assurance systems. In this paper, we propose a novel confidence metric that does not require access to the training data, the use of model ensembles, or the need to train a calibration model on a held-out validation set, and hence, is usable even when only a trained model is available at inference time. A lightweight approach to quantify uncertainty in the output of a model and define a confidence metric is to measure the conformance of the model's decision in the neighborhood of the input. But measuring conformance by sampling in the neighborhood of an input becomes exponentially difficult with increase in the dimension of the input. We use the feature concentration observed in robust models for local dimensionality reduction and attribution-based sampling over the features to compute the confidence metric. We mathematically motivate the proposed metric, and evaluate its effectiveness with two sets of experiments. First, we study the change in accuracy and the associated confidence over out-of-distribution inputs and evaluate the correlation between the accuracy and computed confidence. We also compare our results with the use of logits to estimate uncertainty. Second, we consider attacks such as FGSM, CW, DeepFool, PGD, and adversarial patch generation methods. The computed confidence metric is found to be low on out-of-distribution data and adversarial examples where the accuracy of the model is also low. These experiments demonstrate the effectiveness of the proposed confidence metric to make DNNs more transparent with respect to uncertainty of prediction.
Deep neural networks (DNNs) are known to produce incorrect predictions with very high confidence on out-of-distribution inputs (OODs). This limitation is one of the key challenges in the adoption of DNNs in high-assurance systems such as autonomous driving, air traffic management, and medical diagnosis. This challenge has received significant attention recently, and several techniques have been developed to detect inputs where the model's prediction cannot be trusted. These techniques detect OODs as datapoints with either high epistemic uncertainty or high aleatoric uncertainty. We demonstrate the difference in the detection ability of these techniques and propose an ensemble approach for detection of OODs as datapoints with high uncertainty (epistemic or aleatoric). We perform experiments on vision datasets with multiple DNN architectures, achieving state-of-the-art results in most cases.
We present Sherlock a tool for verification of deep neural networks along with some recent applications of the tool to verification and control problems for autonomous systems. Deep Neural Networks have emerged as an ubiquitous choice of concept representation in machine learning, and are commonly used as function approximators in many learning tasks. Learning methods using deep neural networks are often referred to as deep learning, and they have reached human-level accuracy on benchmark data-sets for several tasks such as image classification and speech recognition. This has inspired the adoption and integration of deep learning methods into safety-critical systems. For example, deep learning is often used in perception and several decision-making modules in automated vehicles. This has created a challenge in verifying and certifying these systems using traditional approaches developed for software and hardware verification. Further, these models are known to be brittle and prone to adversarial examples. Establishing robustness of these models is, thus, an important challenge. For safe integration of learning into critical systems, we must be able to encapsulate these models into assumption-guarantee contracts by verifying guarantees on the outputs of the model given the assumptions on its inputs. We discuss some of our recent work on this topic of verifying deep neural networks that is currently available as an open-source tool, Sherlock.
Models produced by machine learning, particularly deep neural networks, are state-of-the-art for many machine learning tasks and demonstrate very high prediction accuracy. Unfortunately, these models are also very brittle and vulnerable to specially crafted adversarial examples. Recent results have shown that accuracy of these models can be reduced from close to hundred percent to below 5% using adversarial examples. This brittleness of deep neural networks makes it challenging to deploy these learning models in security-critical areas where adversarial activity is expected, and cannot be ignored. A number of methods have been recently proposed to craft more effective and generalizable attacks on neural networks along with competing efforts to improve robustness of these learning models. But the current approaches to make machine learning techniques more resilient fall short of their goal. Further, the succession of new adversarial attacks against proposed methods to increase neural network robustness raises doubts about a foolproof approach to robustify machine learning models against all possible adversarial attacks. In this paper, we consider the problem of detecting adversarial examples. This would help identify when the learning models cannot be trusted without attempting to repair the models or make them robust to adversarial attacks. This goal of finding limitations of the learning model presents a more tractable approach to protecting against adversarial attacks. Our approach is based on identifying a low dimensional manifold in which the training samples lie, and then using the distance of a new observation from this manifold to identify whether this data point is adversarial or not. Our empirical study demonstrates that adversarial examples not only lie farther away from the data manifold, but this distance from manifold of the adversarial examples increases with the attack confidence.
Real-world applications often naturally decompose into several sub-tasks. In many settings (e.g., robotics) demonstrations provide a natural way to specify the sub-tasks. However, most methods for learning from demonstrations either do not provide guarantees that the artifacts learned for the sub-tasks can be safely recombined or limit the types of composition available. Motivated by this deficit, we consider the problem of inferring Boolean non-Markovian rewards (also known as logical trace properties or specifications) from demonstrations provided by an agent operating in an uncertain, stochastic environment. Crucially, specifications admit well-defined composition rules that are typically easy to interpret. In this paper, we formulate the specification inference task as a maximum a posteriori (MAP) probability inference problem, apply the principle of maximum entropy to derive an analytic demonstration likelihood model and give an efficient approach to search for the most likely specification in a large candidate pool of specifications. In our experiments, we demonstrate how learning specifications can help avoid common problems that often arise due to ad-hoc reward composition.
We present an approach to learn and formally verify feedback laws for data-driven models of neural networks. Neural networks are emerging as powerful and general data-driven representations for functions. This has led to their increased use in data-driven plant models and the representation of feedback laws in control systems. However, it is hard to formally verify properties of such feedback control systems. The proposed learning approach uses a receding horizon formulation that samples from the initial states and disturbances to enforce properties such as reachability, safety and stability. Next, our verification approach uses an over-approximate reachability analysis over the system, supported by range analysis for feedforward neural networks. We report promising results obtained by applying our techniques on several challenging nonlinear dynamical systems.
Deep neural networks (NN) are extensively used for machine learning tasks such as image classification, perception and control of autonomous systems. Increasingly, these deep NNs are also been deployed in high-assurance applications. Thus, there is a pressing need for developing techniques to verify neural networks to check whether certain user-expected properties are satisfied. In this paper, we study a specific verification problem of computing a guaranteed range for the output of a deep neural network given a set of inputs represented as a convex polyhedron. Range estimation is a key primitive for verifying deep NNs. We present an efficient range estimation algorithm that uses a combination of local search and linear programming problems to efficiently find the maximum and minimum values taken by the outputs of the NN over the given input set. In contrast to recently proposed "monolithic" optimization approaches, we use local gradient descent to repeatedly find and eliminate local minima of the function. The final global optimum is certified using a mixed integer programming instance. We implement our approach and compare it with Reluplex, a recently proposed solver for deep neural networks. We demonstrate the effectiveness of the proposed approach for verification of NNs used in automated control as well as those used in classification.