Optimizing Resource Utilization for Interactive GPU Workloads with Transparent Container Checkpointing
- Track: HPC, Big Data & Data Science
- Room: UB5.132
- Day: Sunday
- Start: 09:00
- End: 09:25
- Video only: ub5132
- Chat: Join the conversation!
Interactive GPU workloads, such as Jupyter notebooks and generative AI inference are becoming increasingly popular in scientific research and data analysis. However, efficiently allocating expensive GPU resources in multi-tenant environments like Kubernetes clusters is challenging due to the unpredictable usage patterns of these workloads. Container checkpointing was recently introduced as a beta feature in Kubernetes and has been extended to support GPU-accelerated applications. In this talk, we present a novel approach to optimizing resource utilization for interactive GPU workloads using container checkpointing. This approach enables dynamic reallocation of GPU resources based on real-time workload demands, without the need for modifying existing applications. We demonstrate the effectiveness of our approach through experimental evaluations with a variety of interactive GPU workloads and present preliminary results that highlight its potential.
Speakers
Adrian Reber | |
Radostin Stoyanov | |
Viktória Spišaková |