FOSDEM 2026
/
Schedule
/
Events
/
Developer rooms
/
AI Plumbers
/
One GPU, Many Models: What Works and What Segfaults

One GPU, Many Models: What Works and What Segfaults

Track: AI Plumbers
Room: UD2.120 (Chavanne)
Day: Saturday
Start (UTC+1): 13:55
End (UTC+1): 14:15
Chat: Join the conversation!

Serving multiple models on a single GPU sounds great until something segfaults.

Two approaches dominate for parallel inference: MIG (hardware partitioning) and MPS (software sharing). Both promise efficient GPU sharing.

I tested both strategies for video generation workloads in parallel.

This talk digs into what actually happened: where things worked, where memory isolation fell apart, which configs crashed, and what survives under load.

By the end, you'll know:

How to utilize unused GPU capacity.
How to setup MIG and MPS.
Memory issues, crashes, and failures.
Workload specific configs

Speakers

YASH PANCHAL

Attachments

Slides

fosdem-2026

Brussels / 31 January & 1 February 2026

One GPU, Many Models: What Works and What Segfaults

Speakers

Attachments

Links