Brussels / 1 & 2 February 2025

schedule

Optimizing Resource Utilization for Interactive GPU Workloads with Transparent Container Checkpointing


Interactive GPU workloads, such as Jupyter notebooks and generative AI inference are becoming increasingly popular in scientific research and data analysis. However, efficiently allocating expensive GPU resources in multi-tenant environments like Kubernetes clusters is challenging due to the unpredictable usage patterns of these workloads. Container checkpointing was recently introduced as a beta feature in Kubernetes and has been extended to support GPU-accelerated applications. In this talk, we present a novel approach to optimizing resource utilization for interactive GPU workloads using container checkpointing. This approach enables dynamic reallocation of GPU resources based on real-time workload demands, without the need for modifying existing applications. We demonstrate the effectiveness of our approach through experimental evaluations with a variety of interactive GPU workloads and present preliminary results that highlight its potential.

Speakers

Photo of Adrian Reber Adrian Reber
Photo of Radostin Stoyanov Radostin Stoyanov
Photo of Viktória Spišaková Viktória Spišaková

Links