Brussels / 1 & 2 February 2025

schedule

quantizing your GGUF models using iterative refinement of the importance matrix


Presenting Llama-gguf-optimize, the result of work and research in creating high-quality quantizations for multilingual models, specifically of the salamandra series. With a focus on preserving language diversity, the project leverages llama.cpp’s importance matrix approach to minimize quantization loss across distinct language domains. This presentation will outline the scripts in the toolkit in terms of a systematic approach for quantizing your models using iterative refinement of the importance matrix (I-matrix) and evaluating quantization quality through KL-divergence metrics.

Speakers

Robert Collins

Links