FOSDEM 2025
/
Schedule
/
Events
/
Developer rooms
/
Low-level AI Engineering and Hacking
/
History and advances of quantization in llama.cpp

History and advances of quantization in llama.cpp

Track: Low-level AI Engineering and Hacking
Room: UB2.252A (Lameere)
Day: Sunday
Start: 10:30
End: 11:00
Video only: ub2252a
Chat: Join the conversation!

A lot of progress in adoption of genAI we owe to quantization techniques. There are many of the new techniques that ggml/llama.cpp have used over the time. It's not always easy to understand how the various formats work, in many cases it requires reading through the PRs that actually introduced the quantization format. @Ikawrakow (Ivan Kawrakow) is the main person responsible for most of the modern quantization code. Looking through his PRs is generally the best way to learn but really curious you could come to this panel with him and bring your questions! The panel will cover the experience with different quantization techniques in llama.cpp so far, the possibility of going below 2-bit quantization, QAT and other approaches out there.

Speakers

	Tanya Dadasheva
	Iwan Kawrakow

fosdem-2025

Brussels / 1 & 2 February 2025

History and advances of quantization in llama.cpp

Speakers

Links