FOSDEM 2025
/
Schedule
/
Events
/
Developer rooms
/
Low-level AI Engineering and Hacking
/
Tricks Learned from Training Large Open-Source Models

Tricks Learned from Training Large Open-Source Models

Track: Low-level AI Engineering and Hacking
Room: UB2.252A (Lameere)
Day: Sunday
Start: 13:55
End: 14:10
Video only: ub2252a
Chat: Join the conversation!

Tricks learned from training large open-source models on the example of WhisperSpeech, an open-source text-to-speech model.

WhisperSpeech is a new open-source text-to-speech model created by Collabora. It is based on recent research from the biggest AI labs (Google, Meta, Microsoft, OpenAI). It delivers high-quality speech that it learned from tens of thousands of hours of human speech recordings.

To deliver state-of-the-art quality, we scaled our models and training pipelines from hundreds to tens of thousands of hours of speech, and we share the lessons learned along the way. Nearly every component of your initial training process had to be replaced or tweaked heavily.

Challenges we'll briefly cover: - Gone in 16 minutes: the importance of small-scale experiments. - Full throttle: is 100% GPU utilization enough? - Do you need a fancy framework? From single- to multi-GPU training. - Are SSDs fast enough? WebDataset brings a 10x improvement. - Does bigger always mean better? How to effortlessly scale AI models. - Clouds, enthusiasts, or clusters? How to hunt down GPUs. - Defending moats. How is a gaming 4090 different from an H100?

Speakers

Marcus Edel

fosdem-2025

Brussels / 1 & 2 February 2025

Tricks Learned from Training Large Open-Source Models

Speakers

Links