History and advances of quantization in llama.cpp
- Track: Low-level AI Engineering and Hacking
- Room: UB2.252A (Lameere)
- Day: Sunday
- Start: 10:30
- End: 11:00
- Video only: ub2252a
- Chat: Join the conversation!
A lot of progress in adoption of genAI we owe to quantization techniques. There are many of the new techniques that ggml/llama.cpp have used over the time. It's not always easy to understand how the various formats work, in many cases it requires reading through the PRs that actually introduced the quantization format. @Ikawrakow (Ivan Kawrakow) is the main person responsible for most of the modern quantization code. Looking through his PRs is generally the best way to learn but really curious you could come to this panel with him and bring your questions! The panel will cover the experience with different quantization techniques in llama.cpp so far, the possibility of going below 2-bit quantization, QAT and other approaches out there.
Speakers
Tanya Dadasheva | |
Iwan Kawrakow |