FOSDEM 2026
/
Schedule
/
Events
/
Developer rooms
/
AI Plumbers
/
Adventures in Model Quantization

Adventures in Model Quantization

Track: AI Plumbers
Room: UD2.120 (Chavanne)
Day: Saturday
Start: 14:20
End: 14:40
Video only: ud2120
Chat: Join the conversation!

"Adventures in Model Quantization" continues to quest to run high quality models with minimal hardware resources. In this edition, community quantizer John Leimgruber ("ubergarm" on huggingface), tells the story of how a single line change to llama.cpp enabled the 1000B open weights model Kimi-K2-Thinking to maintain full quality while using only half the memory!

This talk presents an overview and visualizations of llama.cpp quantization types and discuses how Quantization Aware Training (QAT) effects mapping models across ecosystems from transformers' safetensors into llama.cpp GGUF.

If you're interested in running the best open-weights LLMs and ai models on gaming rigs, home-lab servers, or privately for your organization, then come learn how to benchmark both quality and speed for all the huggingface quants available for ik/llama.cpp.

This is an updated presentation expanding upon a recent AI Plumber's talk given in October 2025 in San Fransisco:

https://blog.aifoundry.org/p/adventures-in-model-quantization
https://ubergarm.com/images/AI-Plumbers-Conference-2025-SF.pdf

Speakers

ubergarm

fosdem-2026

Brussels / 31 January & 1 February 2026

Adventures in Model Quantization

Speakers

Links