FOSDEM 2026
/
Schedule
/
Events
/
Developer rooms
/
AI Plumbers
/
Multimodal support in llama.cpp - Achievements and Future Directions

Multimodal support in llama.cpp - Achievements and Future Directions

Track: AI Plumbers
Room: UD2.120 (Chavanne)
Day: Saturday
Start (UTC+1): 10:35
End (UTC+1): 10:55
Chat: Join the conversation!

llama.cpp has become a key tool for running LLMs efficiently on any hardware. This talk explores how multimodal features have grown in the project. It focuses on libmtmd, a library added in April 2025 to make multimodal support easier to use and to maintain in llama.cpp.

We will first cover main achievements. These include combining separate CLI tools for different models into one single tool called llama-mtmd-cli. Next, we will discuss how libmtmd works with llama-server and show real examples of low-latency OCR applications. We will also talk about adding audio support, which lets newer models summarize audio inputs. Plus, we will cover the challenges of handling legacy code while keeping the project flexible for future models.

Looking forward, the talk will share plans for new features like video input, text-to-speech support, and image generation. Attendees will also learn how to contribute and use these multimodal tools in their own project.

Speakers

Xuan-Son Nguyen

fosdem-2026

Brussels / 31 January & 1 February 2026

Multimodal support in llama.cpp - Achievements and Future Directions

Speakers

Links