Building Cloud Infrastructure for AI
- Track: Virtualization and Cloud Infrastructure
- Room: H.2213
- Day: Saturday
- Start: 14:00
- End: 14:30
- Video only: h2213
- Chat: Join the conversation!
"GPU clouds" for AI application are the hot topic at the moment, but often these either end up being just big traditional HPC-style cluster deployments instead of actual cloud infrastructure or are built in secrecy by hyperscalers.
In this talk, we'll explore what makes a "GPU cloud" an actual cloud, how requirements differ from traditional cloud infrastructure, and most importantly, how you can build your own using open source technology - all the way from hardware selection (do you really need to buy the six-figures boxes?) over firmware (OpenBMC), networking (SONiC, VPP), storage (Ceph, SPDK), orchestration (K8s, but not the way you think), OS deployment (mkosi, UEFI HTTP netboot), virtualization (QEMU, vhost-user), performance tuning (NUMA, RDMA) to various managed services (load balancing, API gateways, Slurm etc.
In addition to the purely technical side, we'll also go into some of the non-technical challenges of actually running your own infrastructure and how to decide whether this is something that's actually worth doing yourself.
Speakers
| Dave Hughes | |
| Lukas Stockner |