Séminaires Llacan

Séminaire du Llacan – mercredi 23 mars 2022 – Emily Prud’hommeaux : «Integrating machine learning in the language documentation pipeline»

Le mercredi 23 mars 2022, de 10h à 11h, Emily Prud’hommeaux fera un exposé intitulé «Integrating machine learning in the language documentation pipeline».


Despite many advances in technology for capturing, transcribing, and annotating speech recordings, linguists and language communities engaged in language documentation projects continue to face the obstacle of the “transcription bottleneck”. This is especially true for endangered languages, which have few first-language speakers and often lack an established writing system. In this talk, I describe three ongoing technology-driven efforts to support the transcription and morphological annotation of three critically endangered languages indigenous to North America. We find that while documentation is often faster and more accurate when using machine learning support, individuals vary in their opinions on the utility of these technologies. In addition, our results suggest that the quality of the data used to train machine learning models is often not as crucial for accuracy as the overall quantity of data. Finally, I discuss ideas for further stimulating the integration of machine learning tools into language documentation projects.


Emily Prud’hommeaux is the Gianinno Family Sesquicentennial Assistant Professor of Computer Science at Boston College. She received her BA (Harvard) and MA (University of California, Los Angeles) in Linguistics, and her PhD in Computer Science and Engineering (Oregon Health and Science University). Her research focuses on natural language processing and speech signal processing for small datasets, with a particular focus on endangered languages and child language.

Lien Zoom pour assister au séminaire : https://cnrs.zoom.us/j/95654741763

ID  réunion : 956 5474 1763

Code secret : 81Llacan35