IRCS Conference Room
Department of Brain and Cognitive Sciences
Department of Linguistics
University of Rochester
Mapping prosody onto intentions: how listeners cope with variability
Although the prosody of an utterance clearly can help determine the intentions that a speaker intends to convey, listeners (and psycholinguists) face a number of challenges in understanding prosodic cues. Use of prosody is highly variable among speakers. Moreover, prosodic categories are not as well defined as segmental categories (e.g., phonetic and phonemic categories), with cues to “disambiguation” also being weaker and more variable. Moreover, intentions are often signaled by prosodic contours that are composed of asynchronous cues (e.g., a pitch accent followed by a boundary tone later in the utterance). Finally, likely interpretations are strongly determined by the context (e.g., possible Questions under Discussion). To borrow an example from Herb Clark, think about how the interpretation of “I’m hot.” changes with different pitch accents, boundary tones, and contexts.
In ongoing work in my laboratory, Chigusa Kurumada, Meredith Brown and collaborators have been exploring an approach to how listeners meet these challenges. We assume that listeners develop hypotheses about likely interpretations and their expected phonetic realization in the form of generative models, which are updated on the fly (adaptation) based on the statistics of the input. I will present some of our ongoing work, using the construction “It looks like an X.” explored by Kurumada in her dissertation. With verb phrase focus (L+H* on “looks” followed by L-H%.), this utterance can evoke a contrastive interpretation (e.g., It (this picture) looks like a X, but it isn’t), whereas with noun-focus prosody (H* on the noun followed by L-L%) it conveys that the picture most likely is an X. In a series of studies using crowd-sourcing judgment tasks we demonstrate that (1) interpretations of the contours are probabilistic and (2) the distribution of interpretations shifts when (a) the speaker sometimes uses a stronger alternative (It is an X); (b) the speaker uses contours less reliably; and (c) the distribution of phonetic variation of the two contours changes. In visual world eye-tracking studies, we find that (3) listeners generate expectations for a contrastive interpretation immediately upon hearing L+H* and (4) these expectations are modulated by pre-exposure to the reliability with which that speaker uses L+H* to signal contrast with a different construction. Taken together the results confirm some of the central predictions generated by our framework and shed light on how listeners adapt to the variability of prosodic cues.