Brussels / 1 & 2 February 2025

schedule

History and advances of quantization in llama.cpp


A lot of progress in adoption of genAI we owe to quantization techniques. There are many of the new techniques that ggml/llama.cpp have used over the time. It's not always easy to understand how the various formats work, in many cases it requires reading through the PRs that actually introduced the quantization format. @Ikawrakow (Ivan Kawrakow) is the main person responsible for most of the modern quantization code. Looking through his PRs is generally the best way to learn but really curious you could come to this panel with him and bring your questions! The panel will cover the experience with different quantization techniques in llama.cpp so far, the possibility of going below 2-bit quantization, QAT and other approaches out there.

Speakers

Tanya Dadasheva
Iwan Kawrakow