Brussels / 31 January & 1 February 2026

schedule

AI Plumbers


09 10 11 12 13 14 15 16 17 18
Saturday Welcome to the AI Plumbers Devroom
Multimodal support in llama.cpp - Achievements and Future Directions
API Remoting for llama.cpp: Near-Native GPU Speed in macOS Containers
tract - an efficient rust neural network inference engine
Beyond TinyML: Balance inference accuracy and latency on MCUs
WebNN and WebLLM on RISC-V: Closing the AI Acceleration Gap with RVV and Tenstorrent
Single-source cross-platform GPU LLM inference with Slang and Rust
One GPU, Many Models: What Works and What Segfaults
Adventures in Model Quantization
Vulkan API for Machine Learning? Competing with CUDA and ROCm in llama.cpp
Running tinygrad and ggml on microcontroller NPUs
Data Lakes for AI: Open Table Formats as the Foundation
The Hidden Cost of Intelligence: The Energy Footprint of AI from Code to GPU Kernels
Lowering the barrier of entrance in AI-native system development
Supercharging LLM serving with Dynamo
Taming the LLM Zoo with Docker Model Runner: Inference with OCI Artifacts, llama.cpp, and vLLM
From Infrastructure to Production: A Year of Self-Hosted LLMs
Zero to matmul with the ET-SoC-1
All in RISC-V, RISC-V All in AI: Solving Real AI Compute Challenges with DeepComputing & Tenstorrent
Review of kernel and user-space Neural Processing Unit (NPU) chips support on Linux

Read the Call for Papers at https://aifoundry.org/#fosdem.

Event Speakers Start End

Saturday

  Welcome to the AI Plumbers Devroom
Roman Shaposhnik, Tanya Dadasheva 10:30 10:35
  Multimodal support in llama.cpp - Achievements and Future Directions
Xuan-Son Nguyen 10:35 10:55
  API Remoting for llama.cpp: Near-Native GPU Speed in macOS Containers
Kevin Pouget 11:00 11:20
  tract - an efficient rust neural network inference engine
Julien Balian, Mathieu Poumeyrol 11:25 11:45
  Beyond TinyML: Balance inference accuracy and latency on MCUs
Charalampos Mainas, Anastassios Nanos 11:50 12:10
  WebNN and WebLLM on RISC-V: Closing the AI Acceleration Gap with RVV and Tenstorrent
Yuning Liang, Petr Penzin 12:40 13:00
  Single-source cross-platform GPU LLM inference with Slang and Rust
Crozet Sébastien 13:05 13:25
  One GPU, Many Models: What Works and What Segfaults
YASH PANCHAL 13:55 14:15
  Adventures in Model Quantization
ubergarm 14:20 14:40
  Vulkan API for Machine Learning? Competing with CUDA and ROCm in llama.cpp
Ruben Ortlam 14:45 15:05
  Running tinygrad and ggml on microcontroller NPUs
Roman Shaposhnik 15:10 15:15
  Data Lakes for AI: Open Table Formats as the Foundation
Jiffin Tony Thottan 15:20 15:25
  The Hidden Cost of Intelligence: The Energy Footprint of AI from Code to GPU Kernels
Tushar Sharma 15:30 15:35
  Lowering the barrier of entrance in AI-native system development
Tanya Dadasheva 15:35 15:40
  Supercharging LLM serving with Dynamo
Harry Kim, Anish Maddipoti 15:45 16:05
  Taming the LLM Zoo with Docker Model Runner: Inference with OCI Artifacts, llama.cpp, and vLLM
Eric Curtin, Dorin Geman 16:10 16:30
  From Infrastructure to Production: A Year of Self-Hosted LLMs
Mateusz Charytoniuk 16:35 16:55
  Zero to matmul with the ET-SoC-1
Peter Cawley 17:25 17:45
  All in RISC-V, RISC-V All in AI: Solving Real AI Compute Challenges with DeepComputing & Tenstorrent
Martin Chang, Danfeng Zhang 17:50 18:10
  Review of kernel and user-space Neural Processing Unit (NPU) chips support on Linux
Jakov Petrina Trnski 18:15 18:35