Computer Science &
Artificial Intelligence Lab
Massachusetts Institute
of Technology
Friday, April 24
12:00-2:00pm
Embracing Language Diversity: Unsupervised Multilingual Learning For centuries, the deep connection between human languages hasfascinated scholars, and drove many important discoveries in
anthropology and historical linguistics. In this talk, I will show
that this connection can empower unsupervised methods for language
analysis. The key insight is that joint learning from several languages
reduces uncertainty about the linguistic structure of each individual
language.
I will present multilingual generative unsupervised models formorphological segmentation, part-of-speech tagging, and parsing. In all
of these instances we model the multilingual data as arising through a
combination of language-independent and language-specific probabilistic
processes. This feature allows the model to identify and learn from
recurring cross-lingual patterns to improve prediction accuracy in each
language.
This is joint work with Benjamin Snyder, Tahira Naseem and JacobEisenstein.