Publications

Publications by categories and years in reversed chronological order. * signals equal contribution.

Preprints

2024

  1. Full description of an automated pipeline for providing personalized feedback based on audio samples
    Loann Peurey, William N. Havard, Xuan Nga Cao, and Alejandrina Cristia.
    Center for Open Science Feb 2024
  2. Speech Maturity Dataset: A cross-cultural corpus of naturalistic child and adult vocalizations
    Kasia Hitczenko, Loann Peurey, William N. Havard, Kai Jia Tey, Amanda Seidl, Chiara Semenzin, Camila Scaff, Marvin Lavechin, Bridgette Kelleher, Lisa Hamrick, Lucas Gautheron, Margaret Cychosz, Marisa Casillas, and Alejandrina Cristia.
    2024

Journals

2024

  1. Establishing the reliability of metrics extracted from long-form recordings using LENA and the ACLEW pipeline
    Alejandrina Cristia, Lucas Gautheron, Zixing Zhang, Björn Schuller, Camila Scaff, Caroline Rowland, Okko Räsänen, Loann Peurey, Marvin Lavechin, William Havard, Caitlin M. Fausey, Margaret Cychosz, Elika Bergelson, Heather Anderson, Najla Al Futaisi, and Melanie Soderstrom.
    Behavior Research Methods Sep 2024

International Conferences

2023

  1. Interspeech
    <’> in Tsimane’: a Preliminary Investigation
    William Havard, Yaya Sy, Camila Scaff, Loann Peurey, and Alejandrina Cristia.
    In Interspeech 2023, 24th Annual Conference of the International Speech Communication Association, Dublin, Ireland, 20-24 August 2023 2023
  2. Interspeech
    Measuring language development from child-centered recordings
    Yaya Sy, William Havard, Marvin Lavechin, Emmanuel Dupoux, and Alejandrina Cristia.
    In Interspeech 2023, 24th Annual Conference of the International Speech Communication Association, Dublin, Ireland, 20-24 August 2023 2023

2020

  1. CoNLL
    Catplayinginthesnow: Impact of Prior Segmentation on a Model of Visually Grounded Speech
    William Havard, Laurent Besacier, and Jean-Pierre Chevrot.
    In Proceedings of the 24th Conference on Computational Natural Language Learning Nov 2020
  2. LREC
    MaSS: A Large and Clean Multilingual Corpus of Sentence-aligned Spoken Utterances Extracted from the Bible
    *Marcely Zanon Boito, *William Havard, Mahault Garnerin, Éric Le Ferrand, and Laurent Besacier.
    In Proceedings of the 12th Language Resources and Evaluation Conference May 2020

2019

  1. CoNLL
    Word Recognition, Competition, and Activation in a Model of Visually Grounded Speech
    William N. Havard, Jean-Pierre Chevrot, and Laurent Besacier.
    In Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL) Nov 2019
  2. ICASSP
    Models of Visually Grounded Speech Signal Pay Attention to Nouns: A Bilingual Experiment on English and Japanese
    William N. Havard, Jean-Pierre Chevrot, and Laurent Besacier.
    In ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) May 2019

Domestic Conferences

2024

  1. TALN
    Technologies de la parole et données de terrain : le cas du créole haïtien
    William N. Havard, Renauld Govain, Daphne Gonçalves Teixeira, Benjamin Lecouteux, and Emmanuel Schang.
    In Actes de la 31ème Conférence sur le Traitement Automatique des Langues Naturelles, volume 1 : articles longs et prises de position Jul 2024

2021

  1. TALN
    Contribution d’informations syntaxiques aux capacités de généralisation compositionelle des modèles seq2seq convolutifs
    Diana Nicoleta Popa, William N. Havard, Maximin Coavoux, Laurent Besacier, and Eric Gaussier.
    In Traitement Automatique journaldes Langues Naturelles 2021

International Workshops

2025

  1. Speech Technologies with Fieldwork Recordings: the Case of Haitian Creole
    William N. Havard, Renauld Govain, Benjamin Lecouteux, and Emmanuel Schang.
    In Proceedings of the 8th Workshop on Computational Methods for Endangered Languages (ComputEL-8) Mar 2025

2017

  1. SPEECH-COCO: 600k Visually Grounded Spoken Captions Aligned to MSCOCO Data Set
    William Havard, Laurent Besacier, and Olivier Rosec.
    In Proc. GLU 2017 International Workshop on Grounding Language Understanding 2017

Domestic Workshops

2023

  1. Outiller la documentation des langues créoles
    Eric Le Ferrand, Claudel Pierre-Louis, Ruoran Dong, Benjamin Lecouteux, Daphné Gonçalves-Teixeira, William N. Havard, and Emmanuel Schang.
    In LIFT 2023: Journées scientifiques du GdR Linguistique Informatique, Formelle et de Terrain Nov 2023

2022

  1. A study of the production and perception of ’ in Tsimane’
    William Havard, Camila Scaff, Loann Peurey, and Alejandrina Cristia.
    In Journées Jointes des Groupements de Recherche Linguistique Informatique, Formelle et de Terrain (LIFT) et Traitement Automatique des Langues (TAL) Nov 2022

2018

  1. Exploring Textual and Speech information in Dialogue Act Classification with Speaker Domain Adaptation
    Xuanli He, Quan Tran, William Havard, Laurent Besacier, Ingrid Zukerman, and Gholamreza Haffari.
    In Proceedings of the Australasian Language Technology Association Workshop 2018 Dec 2018

Peer-reviewed Abstracts

2025

  1. Questioning Morphological Boundaries with the Help of Automatic Speech Recognition
    William N. Havard, and Emmanuel Schang.
    In Sep 2025

2024

  1. Mobilising the Archive: Training Modern Speech Technology Models with Digitalised Fieldwork Recordings
    William Havard, Emmanuel Schang, and Benjamin Lecouteux.
    In Recent Advances in Language Documentation and Archiving (LD&A’24) Sep 2024
  2. IASCL
    Automated Pipeline Provides Personalized Feedback on Short Caregiver-Child Audio Conversations
    Alejandrina Cristia, Loann Peurey, William Havard, Gwendal Virlet, Xuan-Nga Cao, Juanita Bloomfield Lescarboura, Ana Balsa, Alejandro Cid, Martín Ottavianelli, José Luis Horta Brasil, Camila Scaff, and Kai Jia Tey.
    In Poster presentation at the International Association for the Study of Child Language (IASCL) Conference Jul 2024
  3. IASCL
    Presenting LongFoRMer: A package to organize and analyze long-form recordings
    Loann Peurey, Lucas Gautheron, William Havard, Camila Scaff, Shuvayanti Das, KaiJia Tey, and Alejandrina Cristia.
    In Poster presentation at the International Association for the Study of Child Language (IASCL) Conference Jul 2024
  4. Presenting LongFoRMer: A package to organize and analyze long-form recordings
    Loann Peurey, Lucas Gautheron, William Havard, Camila Scaff, Shuvayanti Das, KaiJia Tey, and Alejandrina Cristia.
    In ColDoc 2024: Linguistics in a New Era: Discourse, Methods, and Technologies in the Contemporary Landscape Jul 2024
  5. Introducing the Speech Maturity Dataset: Research opportunities for speech scientists and linguistic fieldworkers
    Margaret Cychosz, Kasia Hitczenko, William N. Havard, Loann Peurey, Madurya Suresh, Theo Zhang, and Alex Cristia.
    In Proceedings of CorpusPhon 2024
  6. Exploring the Impact of Syllable Complexity on Canonical Proportion in Children: Insights from a Multilingual and Cross-cultural Study
    Kai Jia Tey, Sarah Walker, Amanda Seidl, Camila Scaff, Loann Peurey, Bridgette L. Kelleher, Kasia Hitczenko, William N. Havard, Lisa R. Hamrick, Pauline Grosjean, Margaret Cychosz, Heidi Colleran, Marisa Casillas, Elika Bergelson, and Alejandrina Cristia.
    In Proceedings of the Workshop on Infant Language Development (WILD) 2024

2023

  1. The Speech Maturity Dataset
    William N. Havard, Loann Peurey, Kasia Hitczenko, and Alejandrina Cristia.
    In Proceedings of the Many Paths to Language (MPaL) Workshop Nov 2023

2022

  1. ESCOP
    Lexical Acquisition: Start Small and Build up or Start Big and Break Down? A Study on Lexical Acquisition Using Visually Grounded Artificial Neural Networks
    William N. Havard
    In European Society for Cognitive Psychology Aug 2022
  2. ESCOP
    Modeling and Measuring Children’s Language Development Using Language Models
    Yaya Sy, William N. Havard, and Alejandrina Cristia.
    In European Society for Cognitive Psychology Aug 2022

2018

  1. Emergence of Attention in a neural model of Visually Grounded Speech
    William N. Havard, Jean-Pierre Chevrot, and Laurent Besacier.
    In Jul 2018

Non Peer-reviewed Abstracts

2024

  1. Corpus francophones et créolophones à La Réunion
    William Havard, and Gudrun Ledegen.
    In Corpus et méthodes pour l’étude de la variation dans l’espace francophone et au-delà (CoMeVar) Nov 2024

Thesis

2021

  1. Lexical emergence from context : exploring unsupervised learning approaches on large multimodal language corpora
    William N. Havard
    2021

2017

  1. Découverte non supervisée de lexique à partir d’un corpus multimodal pour la documentation des langues en danger
    William N. Havard
    May 2017