Learning

Assured Neuro Symbolic Learning and Reasoning (DARPA) [SRI funding: 5.4M]

 FLASH: Functionality-based Logic-driven Anchors with Semantic Hierarchy (DARPA) [SRI funding: 2.1M]

Neuro-symbolic Co-designer for Symbiotic Design of Cyber Physical Systems (DARPA) : [SRI funding: 5.63M ]

Trojans in Artificial Intelligence (IARPA) [SRI funding: 7.22M]

ALES in Assured Autonomy (DARPA) [SRI funding: 2.02M]

Quantum-Inspired Classical Computing (QuICC) (DARPA) [SRI funding: 12.5M]

Intent-Defined Adaptive Software (DARPA) [SRI funding: 3.99M]

Self-Improving Cyber-Physical Systems (NSF CPS Small) [SRI funding: 500K]

Duality-Based Algorithm Synthesis (NSF EAGER) [SRI funding: 250K]

Data-efficient Learning of Robust Control Policies

This paper investigates data-efficient methods for learning robust control policies. Reinforcement learning has emerged as an effective approach to learn control policies by interacting directly with the plant, but it requires a significant number of example trajectories to converge to the optimal policy. Combining model-free reinforcement learning with model-based control methods achieves better data-efficiency via simultaneous system identification and controller synthesis. We study a novel approach that exploits the existence of approximate physics models to accelerate the learning of control policies. The proposed approach consists of iterating through three key steps- evaluating a selected policy on the real-world plant and recording trajectories, building a Gaussian process model to predict the reality-gap of a parametric physics model in the neighborhood of the selected policy, and synthesizing a new policy using reinforcement learning on the refined physics model that most likely approximates the real plant. The approach converges to an optimal policy as well as an approximate physics model. The real world experiments are limited to evaluating only promising candidate policies, and the use of Gaussian processes minimizes the number of required real world trajectories. We demonstrate the effectiveness of our techniques on a set of simulation case-studies using OpenAI gym environments.