Tricks Learned from Training Large Open-Source Models
- Track: Low-level AI Engineering and Hacking
- Room: UB2.252A (Lameere)
- Day: Sunday
- Start: 13:55
- End: 14:10
- Video only: ub2252a
- Chat: Join the conversation!
Tricks learned from training large open-source models on the example of WhisperSpeech, an open-source text-to-speech model.
WhisperSpeech is a new open-source text-to-speech model created by Collabora. It is based on recent research from the biggest AI labs (Google, Meta, Microsoft, OpenAI). It delivers high-quality speech that it learned from tens of thousands of hours of human speech recordings.
To deliver state-of-the-art quality, we scaled our models and training pipelines from hundreds to tens of thousands of hours of speech, and we share the lessons learned along the way. Nearly every component of your initial training process had to be replaced or tweaked heavily.
Challenges we'll briefly cover: - Gone in 16 minutes: the importance of small-scale experiments. - Full throttle: is 100% GPU utilization enough? - Do you need a fancy framework? From single- to multi-GPU training. - Are SSDs fast enough? WebDataset brings a 10x improvement. - Does bigger always mean better? How to effortlessly scale AI models. - Clouds, enthusiasts, or clusters? How to hunt down GPUs. - Defending moats. How is a gaming 4090 different from an H100?
Speakers
Marcus Edel |