Penn Arts & Sciences Logo


Friday, March 11, 2016 - 12:00pm

IRCS Conference Room

Mark Steedman
School of Informatics
University of Edinburgh

Bootstrapping Language Acquisition

Recent work with Abend, Kwiatkowski, Smith, and Goldwater (2016) has shown that a general-purpose program for inducing parsers incrementally from sequences of paired strings (in any combinatory categorial language) and meanings (in any convenient language of logical form) can be applied to real English child-directed utterance from the CHILDES corpus to successfully learn the child's ("Eve's") grammar, combining lexical and syntactic learning in a single pass through the data.

While the earliest stages of learning necessarily proceed by pure "semantic bootstrapping", building a probabilistic model of all possible pairings of all possible words and derivations with all possible decompositions of logical form, the later stages of learning show emergent effects of "syntactic bootstrapping" (Gleitman 1990), where EVE's increasing knowledge of the grammar of the language allows it to identify the syntactic type and meaning of unseen words in one trial, as has been shown to be characteristic of real children in experiments with nonce-word learning. The concluding sections of the talk consider Gleitman's argument that such learning can occur in situations where the meaning of an unknown word is either unavailable from the situation, or only partially available, and must be learned from subsequent occasions of use. I'll argue that this process can also be understood computationally by analogy with the process of machine learning of a "clustered entailment" semantics proposed by Lewis and the present author as a component of a robust system for question-answering from unrestricted text.