Towards Intelligent Agents that Learn by Multimodal Commmunication


Sponsor: Machine Learning, Reasoning and Intelligence Program, Office of Naval Research

Principal Investigator: Kenneth D. Forbus

Project Summary: We propose to explore how to create intelligent agents that learn by multimodal communication, in order to perform commonsense reasoning. People commonly communicate with each other using coordinated modalities, such as sketching and talking, or reading texts illustrated with diagrams. Our AI systems need the same capabilities. We propose to explore how to do fluent multimodal communication in the context of knowledge capture, to support commonsense reasoning. Commonsense reasoning is crucial for intelligent systems because it is part of the shared background assumed in working with human partners, and provides a foundation for future learning. Our hypotheses are: (1) qualitative representations are a crucial part of commonsense knowledge and (2) analogical reasoning and learning provides robustness in reasoning, plus human-like learning of complex relational structures. Unlike deep learning systems, for example, analogical learning systems can handle relational structures such as arguments, proofs, and plans, while learning with orders of magnitude less data. This research should help pave the way for intelligent systems that can interact with, and learn from, people using natural modalities, as well as make progress on understanding the nature of human cognition.

Using the Companion cognitive architecture, we propose to explore the following ideas:

  1. Hybrid Primal Sketch. Our CogSketch system provides a model of high-level human vision that has been used both to model multiple human visual problem-solving tasks and in deployed sketch-based educational software. We propose to build a hybrid primal sketch processor, which combines CogSketch, off-the-shelf computer vision algorithms, and deep learning recognition systems, to process images, especially diagrams.
  2. Analogical Learning of Narrative Function. Our prior work on analogical question-answering has led to algorithms that provide competitive performance on several datasets, while being more data-efficient than today’s machine learning systems. In this project we propose to extend these ideas to learning narrative functions, i.e. the higher levels of semantic interpretation that ascribe purpose relative to larger tasks to pieces of text. Building on observations of how people learn to read, we plan to build dialogue models for natural annotation, i.e. ways that trainers can teach systems how to interpret multimodal materials, to bootstrap them in a data-efficient manner.
We will test these ideas using a variety of materials, including Navy training materials and existing corpora and datasets. Elementary school science concerns commonsense reasoning about the physical and biological realms, so we also plan to use the Allen Institute for Artificial Intelligence tests and materials to evaluate our progress.

Selected publications:

  1. Forbus, K. (2019). Qualitative Representations: How People Reason and Learn about the Continuous World, MIT Press.
  2. Forbus, K., Chang, M. Ribeiro, D., Hinrichs, T., Crouse, M., & Witbrock, M. (2019). Step Semantics: Representations for State Changes in Natural Language. Proceedings of the Reasoning for Complex Question-Answering Workshop, AAAI 2019, Honololu, HI.
  3. Chen, K., Rabkina, I., McLure, M., & Forbus, K. (2019) Human-like Sketch Object Recognition via Analogical Learning. AAAI 2019.
  4. Chen, K., Forbus, K., Gentner, D., Hespos, S. & Anderson, E. (2020) Simulating Infant Visual Learning by Comparison: An Initial Model. In Proceedings of CogSci 2020, Online.
  5. Chen, K., & Forbus, K. (2021) Visual Relation Detection using Hybrid Analogical Learning. Proceedings of AAAI 2021
  6. Forbus, K. & Lovett, A. (2021) Same/different in visual reasoning. Current Opinion in Behavioral Sciences, 3763-68.


Back to Projects page | Back to QRG Home Page