Online / 5 & 6 February 2022

visit

How to Start a Language on Mozilla Common Voice?

A case study for under-resourced Turkish Language


On Mozilla Common Voice, as of December 2021, there are 154 locales, but only 87 fulfilled the requirements to collect voices, where 27 of them are fairly new. In this two-part presentation, we want to give some starting points for the new language communities, share our accumulated knowledge in the last year while working on the under-resourced Turkish language, with initial training results.

The presentation includes the following topics: Resources on Mozilla Common Voice, how to analyze your dataset, how to set goals, how to design a social media campaign, what tools you can use, Google Colabs, Coqui STT, and our roundups on training Common Voice Turkish Dataset v1 - v7.0, all with our successes and failures as Common Voice Turkish Volunteers group as lessons learned.

  • Errata: In the video "checkpoint" is mistakenly written/spoken as "breakpoint", these are corrected in the slides.
  • Addendum: Our dataset analysis and training results for the Common Voice v8.0 dataset have been added as new slides and video.

Speakers

Photo of Bülent Özden Bülent Özden

Attachments

Links