How Llamagator helps to implement LLM-as-a-Judge concept on your local machine
- Track: Low-level AI Engineering and Hacking
- Room: UB2.252A (Lameere)
- Day: Sunday
- Start: 11:55
- End: 12:00
- Video only: ub2252a
- Chat: Join the conversation!
In this talk, I explore how the landscape of large language model (LLM) accessibility has shifted dramatically.
It is now possible to run these powerful models locally, right on your laptop, eliminating the need for cloud-based solutions like OpenAI. Previously, the sheer size of LLMs, requiring massive GPUs and RAM, made local deployment impossible for most developers. This reliance on cloud services limited experimentation, customization, and affordability.
My presentation focuses on Llama.cpp, an inference engine enabling efficient execution of LLMs, including Meta's Llama, Qwen, and Mistral models, on CPUs.
I detail the process of acquiring, building, and quantizing models for local use, showcasing how Ruby bindings and a built-in HTTP server simplify interaction. I also introduce two open-source tools I've created: Llamagator and Rspec-Llama.
Llamagator is a Rails application, streamlines the management, testing, and comparison of various LLMs, both local and cloud-based. With it, you can create prompts, define assertions, and evaluate model performance and easily implement pattern of LLM-as-a-judge.
Rspec-Llama extends Rspec with a specialized DSL for interacting with and validating responses from LLMs, making it easier than ever to integrate these models into testing workflows. These tools, combined with the ability to run LLMs locally, empower developers to explore AI's potential without relying on external providers.
Speakers
Sergy Sergyenko |