Brussels / 31 January & 1 February 2026

schedule

AI Plumbers


09 10 11 12 13 14 15 16 17 18
Saturday Welcome to the AI Plumbers Devroom
Multimodal support in llama.cpp - Achievements and Future Directions
API Remoting for llama.cpp: Near-Native GPU Speed in macOS Containers
tract - an efficient rust neural network inference engine
Beyond TinyML: Balance inference accuracy and latency on MCUs
Bringing up bare metal ExecuTorch on RISC-V
WebNN and WebLLM on RISC-V: Closing the AI Acceleration Gap with RVV and Tenstorrent
Single-source cross-platform GPU LLM inference with Slang and Rust
Closing the Loop: A Self-Learning Compiler for AI Accelerators
One GPU, Many Models: What Works and What Segfaults
Adventures in Model Quantization
Vulkan API for Machine Learning? Competing with CUDA and ROCm in llama.cpp
Running tinygrad and ggml on microcontroller NPUs
The Hidden Cost of Intelligence: The Energy Footprint of AI from Code to GPU Kernels
Lowering the barrier of entrance in AI-native system development
Supercharging LLM serving with Dynamo
Taming the LLM Zoo with Docker Model Runner: Inference with OCI Artifacts, llama.cpp, and vLLM
From Infrastructure to Production: A Year of Self-Hosted LLMs
A practical introduction to the ET platform.
Zero to matmul with the ET-SoC-1
All in RISC-V, RISC-V All in AI: Solving Real AI Compute Challenges with DeepComputing & Tenstorrent
Review of kernel and user-space Neural Processing Unit (NPU) chips support on Linux
TT-Boltz: Drug Discovery on Tenstorrent Hardware

Read the Call for Papers at https://aifoundry.org/#fosdem.

Event Speakers Start End

Saturday

  Welcome to the AI Plumbers Devroom
Roman Shaposhnik, Tanya Dadasheva 10:30 10:35
  Multimodal support in llama.cpp - Achievements and Future Directions
Xuan-Son Nguyen 10:35 10:55
  API Remoting for llama.cpp: Near-Native GPU Speed in macOS Containers
Kevin Pouget 11:00 11:20
  tract - an efficient rust neural network inference engine
Julien Balian, Mathieu Poumeyrol 11:25 11:45
  Beyond TinyML: Balance inference accuracy and latency on MCUs
Charalampos Mainas, Anastassios Nanos 11:50 12:10
  Bringing up bare metal ExecuTorch on RISC-V
William Jones, Jeremy Bennett, Shane Slattery, Pietra Ferreira 12:15 12:35
  WebNN and WebLLM on RISC-V: Closing the AI Acceleration Gap with RVV and Tenstorrent
Yuning Liang, Petr Penzin 12:40 13:00
  Single-source cross-platform GPU LLM inference with Slang and Rust
Crozet Sébastien 13:05 13:25
  Closing the Loop: A Self-Learning Compiler for AI Accelerators
Ramon Wirsch 13:30 13:50
  One GPU, Many Models: What Works and What Segfaults
YASH PANCHAL 13:55 14:15
  Adventures in Model Quantization
ubergarm 14:20 14:40
  Vulkan API for Machine Learning? Competing with CUDA and ROCm in llama.cpp
Ruben Ortlam 14:45 15:05
  Running tinygrad and ggml on microcontroller NPUs
Roman Shaposhnik 15:10 15:15
  The Hidden Cost of Intelligence: The Energy Footprint of AI from Code to GPU Kernels
Tushar Sharma 15:20 15:25
  Lowering the barrier of entrance in AI-native system development
Tanya Dadasheva 15:30 15:35
  Supercharging LLM serving with Dynamo
Piotr Tarasiewicz 15:40 16:00
  Taming the LLM Zoo with Docker Model Runner: Inference with OCI Artifacts, llama.cpp, and vLLM
Eric Curtin, Dorin Geman 16:05 16:25
  From Infrastructure to Production: A Year of Self-Hosted LLMs
Mateusz Charytoniuk, Gosia Zagajewska, Luiz Miguel 16:30 16:50
  A practical introduction to the ET platform.
Gianluca Guida 16:55 17:20
  Zero to matmul with the ET-SoC-1
Peter Cawley 17:25 17:45
  All in RISC-V, RISC-V All in AI: Solving Real AI Compute Challenges with DeepComputing & Tenstorrent
Martin Chang, Danfeng Zhang 17:50 18:10
  Review of kernel and user-space Neural Processing Unit (NPU) chips support on Linux
Jakov Petrina Trnski 18:15 18:35
  TT-Boltz: Drug Discovery on Tenstorrent Hardware
Moritz Thüning 18:40 19:00