GPU Virtualization with MIG: Multi-Tenant Isolation for AI Inference Workloads
- Track: Virtualization and Cloud Infrastructure
- Room: H.2213
- Day: Saturday
- Start: 18:00
- End: 18:30
- Video only: h2213
- Chat: Join the conversation!
Serving AI models on a single GPU for multi tenant workload sounds challenging till you partition a GPU correctly.
This talk is a deep technical exploration of running AI inference workloads on modern GPUs across Hopper and Blackwell using Multi-Instance GPU (MIG) isolation.
We'll explore:
- The multi-tenant problem: MIG vs other GPU slicing methods.
- MIG Fundamentals: Key concepts, working and support.
- Managing MIG instances: creation, configuration, monitoring and deletion.
- Identifying right approaches based on your workload.
- Common issues and failures
Whether you're building a multi-tenant inference platform, optimizing GPU utilization for your team, or exploring how to serve AI models cost-effectively, this talk provides practical configurations for your AI workloads.
Speakers
| YASH PANCHAL |