FOSDEM 2025
/
Schedule
/
Events
/
Developer rooms
/
HPC, Big Data & Data Science
/
Optimizing Resource Utilization for Interactive GPU Workloads with Transparent Container Checkpointing

Optimizing Resource Utilization for Interactive GPU Workloads with Transparent Container Checkpointing

Track: HPC, Big Data & Data Science
Room: UB5.132
Day: Sunday
Start: 09:00
End: 09:25
Video only: ub5132
Chat: Join the conversation!

Interactive GPU workloads, such as Jupyter notebooks and generative AI inference are becoming increasingly popular in scientific research and data analysis. However, efficiently allocating expensive GPU resources in multi-tenant environments like Kubernetes clusters is challenging due to the unpredictable usage patterns of these workloads. Container checkpointing was recently introduced as a beta feature in Kubernetes and has been extended to support GPU-accelerated applications. In this talk, we present a novel approach to optimizing resource utilization for interactive GPU workloads using container checkpointing. This approach enables dynamic reallocation of GPU resources based on real-time workload demands, without the need for modifying existing applications. We demonstrate the effectiveness of our approach through experimental evaluations with a variety of interactive GPU workloads and present preliminary results that highlight its potential.