FOSDEM 2025
/
Schedule
/
Events
/
Developer rooms
/
Low-level AI Engineering and Hacking
/
GPUStack: Building a Simple and Scalable Management Experience for Diverse AI Models

GPUStack: Building a Simple and Scalable Management Experience for Diverse AI Models

Track: Low-level AI Engineering and Hacking
Room: UB2.252A (Lameere)
Day: Sunday
Start: 12:20
End: 12:40
Video only: ub2252a
Chat: Join the conversation!

Outstanding tools like llama.cpp, Ollama, and LM Studio have made life significantly easier for developers. Running large language models (LLMs) on laptops has become remarkably convenient. However, inference engines and their wrappers don’t address the following challenges: 1. Scaling your solution as your team grows. 2. Supporting models beyond LLMs, such as using diffusion models for role-playing applications, TTS models for NotebookLM equivalents, rerankers and embeddings for retrieval-augmented generation (RAG), and more.

Today, both models and inference engines are highly diverse and rapidly evolving, while GPU resources remain fragmented and heterogeneous. In this talk, we will share our experience building GPUStack — a platform designed to help developers abstract away these complexities and focus solely on building APIs for AI applications.

Speakers

	Lawrence Li
	Frank Mai

fosdem-2025

Brussels / 1 & 2 February 2025

GPUStack: Building a Simple and Scalable Management Experience for Diverse AI Models

Speakers

Links