Learning by Reading
Supported by DARPA IPTO
Our goal is to create a computational account of learning from reading. Our focus is on conceptual knowledge, i.e., factual knowledge, including general principles, explicit strategies, as well as particular information about the world, the kinds of things that occur in it, and how it came to be. Language, especially written language, is human culture’s best invention for enabling learners to accumulate substantial bodies of conceptual knowledge. We want to enable our machines to exploit this same resource. We are going to take a lesson from human culture: We will simplify the texts for our system, in ways similar to how people produce simpler texts for children. This will enable us to stay focused on learning conceptual knowledge from text instead of improving parsing methods.
While our texts will be simplified syntactically, our goals are to maintain conceptual breadth in what our system can handle. There will be no predefined limits hard-wired into the system based on working in a specific domain; any limitations it will have will come from lacking knowledge, with the goal being that most such limitations can be overcome via further reading or by natural language advice from people working with it. The system will process multiple genres of texts: Lessons illustrating general principles, glossaries to obtain a first-cut understanding of new vocabulary terms, and stories to learn about events and history. Maps and diagrams will be available as parts of some texts, to provide spatial information that is best depicted in those media. We are tackling interpretation of continuous metaphors and explicit explanatory analogies in texts, to make communication with the system more natural.
Reasoning effectively with learned knowledge is essential. We will rely heavily on analogical processing for using learned knowledge, in two ways. First, stories and lessons will be retrieved and applied to solve new problems and understand new situations directly via analogy. Second, generalizations will be created via comparison of multiple examples to produce new principles that can be used in reasoning.
The domain we will work in will be everyday political science, including international relations, history, and culture. This is a useful domain for driving this research for several reasons. First, it is extremely broad, and relies on significant amounts of general knowledge about the world. This makes it very challenging, well beyond the state of the art for existing systems and technologies. Second, the content itself is of interest to the military, given the increasing importance of understanding cultural sensitivities and context in its missions, and to the intelligence community. Third, there are ample materials available that can be adapted to our needs, without requiring security clearances or expensive/scarce subject-matter experts.
Our goal is for our system to learn enough about world history and international relations so that, for example, given a story about a current event it can make plausible predictions, defend them in follow-up questioning, and help work through possible impacts of responses to that event. This will involve drawing upon historical precedents and its understanding of how political, economic, cultural, and geographical factors affect human affairs, learned by earlier reading.