Deep Learning

Neuro-symbolic Inference for Internet Of Battlefield Things (Army Research Lab) [SRI funding: 1.6M]

Safe Assured Neuro-Symbolic AI Agent (SANSA) for Health (ARPA-H) [SRI funding: 9.75M]

Trinity4Cyber (US Govt.) [SRI funding: 2.5M]

Concept-based Analysis of Neural Networks via Vision-Language Models

The analysis of vision-based deep neural networks (DNNs) is highly desirable but it is very challenging due to the difficulty of expressing formal specifications for vision tasks and the lack of efficient verification procedures. In this paper, we propose to leverage emerging multimodal, vision-language, foundation models (VLMs) as a lens through which we can reason about vision models. VLMs have been trained on a large body of images accompanied by their textual description, and are thus implicitly aware of high-level, human-understandable concepts describing the images. We describe a logical specification language π™²πš˜πš—πšœπš™πšŽπšŒ designed to facilitate writing specifications in terms of these concepts. To define and formally check π™²πš˜πš—πšœπš™πšŽπšŒ specifications, we build a map between the internal representations of a given vision model and a VLM, leading to an efficient verification procedure of natural-language properties for vision models. We demonstrate our techniques on a ResNet-based classifier trained on the RIVAL-10 dataset using CLIP as the multimodal model.

Direct Amortized Likelihood Ratio Estimation

We introduce a new amortized likelihood ratio estimator for likelihood-free simulation-based inference (SBI). Our estimator is simple to train and estimates the likelihood ratio using a single forward pass of the neural estimator. Our approach directly computes the likelihood ratio between two competing parameter sets which is different from the previous approach of comparing two neural network output values. We refer to our model as the direct neural ratio estimator (DNRE). As part of introducing the DNRE, we derive a corresponding Monte Carlo estimate of the posterior. We benchmark our new ratio estimator and compare to previous ratio estimators in the literature. We show that our new ratio estimator often outperforms these previous approaches. As a further contribution, we introduce a new derivative estimator for likelihood ratio estimators that enables us to compare likelihood-free Hamiltonian Monte Carlo (HMC) with random-walk Metropolis-Hastings (MH). We show that HMC is equally competitive, which has not been previously shown. Finally, we include a novel real-world application of SBI by using our neural ratio estimator to design a quadcopter.

Principled OOD Detection via Multiple Testing

We study the problem of out-of-distribution (OOD) detection, that is, detecting whether a machine learning (ML) model's output can be trusted at inference time. While a number of tests for OOD detection have been proposed in prior work, a formal framework for studying this problem is lacking. We propose a definition for the notion of OOD that includes both the input distribution and the ML model, which provides insights for the construction of powerful tests for OOD detection. We also propose a multiple hypothesis testing inspired procedure to systematically combine any number of different statistics from the ML model using conformal p-values. We further provide strong guarantees on the probability of incorrectly classifying an in-distribution sample as OOD. In our experiments, we find that threshold-based tests proposed in prior work perform well in specific settings, but not uniformly well across different OOD instances. In contrast, our proposed method that combines multiple statistics performs uniformly well across different datasets and neural networks architectures.

AircraftVerse: A Large-Scale Multimodal Dataset of Aerial Vehicle Designs

We present AircraftVerse, a publicly available aerial vehicle design dataset. Aircraft design encompasses different physics domains and, hence, multiple modalities of representation. The evaluation of these cyber-physical system (CPS) designs requires the use of scientific analytical and simulation models ranging from computer-aided design tools for structural and manufacturing analysis, computational fluid dynamics tools for drag and lift computation, battery models for energy estimation, and simulation models for flight control and dynamics. AircraftVerse contains 27,714 diverse air vehicle designs - the largest corpus of engineering designs with this level of complexity. Each design comprises the following artifacts: a symbolic design tree describing topology, propulsion subsystem, battery subsystem, and other design details; a STandard for the Exchange of Product (STEP) model data; a 3D CAD design using a stereolithography (STL) file format; a 3D point cloud for the shape of the design; and evaluation results from high fidelity state-of-the-art physics models that characterize performance metrics such as maximum flight distance and hover-time. We also present baseline surrogate models that use different modalities of design representation to predict design performance metrics, which we provide as part of our dataset release. Finally, we discuss the potential impact of this dataset on the use of learning in aircraft design and, more generally, in CPS. AircraftVerse is accompanied by a data card, and it is released under Creative Commons Attribution-ShareAlike (CC BY-SA) license.

Counterexample Guided Inductive Synthesis Using Large Language Models and Satisfiability Solving

Generative large language models (LLMs) with instruct training such as GPT-4 can follow human-provided instruction prompts and generate human-like responses to these prompts. Apart from natural language responses, they have also been found to be effective at generating formal artifacts such as code, plans, and logical specifications from natural language prompts. Despite their remarkably improved accuracy, these models are still known to produce factually incorrect or contextually inappropriate results despite their syntactic coherence -- a phenomenon often referred to as {\em hallucinations}. This limitation makes it difficult to use these models to synthesize formal artifacts that are used in safety-critical applications. Unlike tasks such as text summarization and question-answering, bugs in code, plan, and other formal artifacts produced by LLMs can be catastrophic. We posit that we can use the satisfiability modulo theory (SMT) solvers as deductive reasoning engines to analyze the generated solutions from the LLMs, produce counterexamples when the solutions are incorrect, and provide that feedback to the LLMs exploiting the dialog capability of instruct-trained LLMs. This interaction between inductive LLMs and deductive SMT solvers can iteratively steer the LLM to generate the correct response. In our experiments, we use planning over the domain of blocks as our synthesis task for evaluating our approach. We use GPT-4, GPT3.5 Turbo, Davinci, Curie, Babbage, and Ada as the LLMs and Z3 as the SMT solver. Our method allows the user to communicate the planning problem in natural language; even the formulation of queries to SMT solvers is automatically generated from natural language. Thus, the proposed technique can enable non-expert users to describe their problems in natural language, and the combination of LLMs and SMT solvers can produce provably correct solutions.

Dehallucinating Large Language Models Using Formal Methods Guided Iterative Prompting

Large language models (LLMs) such as ChatGPT have been trained to generate human-like responses to natural language prompts. LLMs use a vast corpus of text data for training, and can generate coherent and contextually relevant responses to a wide range of questions and statements. Despite this remarkable progress, LLMs are prone to hallucinations making their application to safety-critical applications such as autonomous systems difficult. The hallucinations in LLMs refer to instances where the model generates responses that are not factually accurate or contextually appropriate. These hallucinations can occur due to a variety of factors, such as the model’s lack of real-world knowledge, the influence of biased or inaccurate training data, or the model’s tendency to generate responses based on statistical patterns rather than a true understanding of the input. While these hallucinations are a nuisance in tasks such as text summarization and question-answering, they can be catastrophic when LLMs are used in autonomy-relevant applications such as planning. In this paper, we focus on the application of LLMs in autonomous systems and sketch a novel self-monitoring and iterative prompting architecture that uses formal methods to detect these errors in the LLM response automatically. We exploit the dialog capability of LLMs to iteratively steer them to responses that are consistent with our correctness specification. We report preliminary experiments that show the promise of the proposed approach on tasks such as automated planning.

Detecting Trojaned DNNs using counterfactual attributions

We target the problem of detecting Trojans or backdoors in DNNs. Such models behave normally with typical inputs but produce targeted mispredictions for inputs poisoned with a Trojan trigger. Our approach is based on a novel intuition that the trigger behavior is dependent on a few ghost neurons that are activated for both input classes and trigger pattern. We use counterfactual explanations, implemented as neuron attributions, to measure significance of each neuron in switching predictions to a counter-class. We then incrementally excite these neurons and observe that the model’s accuracy drops sharply for Trojaned models as compared to benign models. We support this observation through a theoretical result that shows the attributions for a Trojaned model are concentrated in a small number of features. We encode the accuracy patterns by using a deep temporal set encoder for trojan detection that enables invariance to model architecture and a number of classes. We evaluate our approach on four US IARPA/NIST-TrojAI benchmarks with high diversity in model architectures and trigger patterns. We show consistent gains over state-of-the-art adversarial attack based model diagnosis (+5.8%absolute) and trigger reconstruction based methods (+23.5%), which often require strong assumptions on the nature of the attack.