Responsible Reasoning with Large Language Models and the Impact of Proper Nouns


Language models with billions of parameters have shown remarkable emergent properties, including the ability to reason on unstructured data. We show that open-science multi-lingual large language models can perform the task of spatial reasoning on two or more entities with significant accuracy. A responsible large language model would perform this spatial reasoning task with the same accuracy regardless of the choice of the names of the entities over which the spatial relationships are defined. However, we show that the accuracies of contemporary large language models are impacted by the choice of proper nouns even when the underlying task ought to be independent of the choice of proper nouns. In this context, we observe that the conditional log probabilities or beam scores of open-science multi-lingual large language model predictions are not well-calibrated, and the beam scores do not discriminate well between correct and wrong responses in this context.

In Workshop on Trustworthy and Socially Responsible Machine Learning, NeurIPS 2022
Susmit Jha
Susmit Jha
Technical Director, NuSCI

My research interests include artificial intelligence, formal methods, machine learning and dynamical systems.