FOSDEM 2025
/
Schedule
/
Events
/
Developer rooms
/
Low-level AI Engineering and Hacking
/
quantizing your GGUF models using iterative refinement of the importance matrix

quantizing your GGUF models using iterative refinement of the importance matrix

Track: Low-level AI Engineering and Hacking
Room: UB2.252A (Lameere)
Day: Sunday
Start: 11:00
End: 11:20
Video only: ub2252a
Chat: Join the conversation!

Presenting Llama-gguf-optimize, the result of work and research in creating high-quality quantizations for multilingual models, specifically of the salamandra series. With a focus on preserving language diversity, the project leverages llama.cpp’s importance matrix approach to minimize quantization loss across distinct language domains. This presentation will outline the scripts in the toolkit in terms of a systematic approach for quantizing your models using iterative refinement of the importance matrix (I-matrix) and evaluating quantization quality through KL-divergence metrics.

Speakers

Robert Collins

fosdem-2025

Brussels / 1 & 2 February 2025

quantizing your GGUF models using iterative refinement of the importance matrix

Speakers

Links