FOSDEM 2025
/
Schedule
/
Events
/
Developer rooms
/
Low-level AI Engineering and Hacking
/
Self-hosted LLMs at a scale with Paddler

Self-hosted LLMs at a scale with Paddler

Track: Low-level AI Engineering and Hacking
Room: UB2.252A (Lameere)
Day: Sunday
Start: 12:40
End: 13:00
Video only: ub2252a
Chat: Join the conversation!

Paddler is an open-source llama.cpp load balancer designed to address unique challenges that Large Language Models pose.

Typical balancing algorithms like round-robin or least-connections are not the most efficient approaches.

To introduce predictability into your infrastructure, Paddler reaches for alternative solutions that account for unpredictable response times while being able to scale services up and down at any moment.

This talk will demonstrate Paddler's general design concepts (the "why") and some primary use cases (the "how").

Speakers

Mateusz Charytoniuk

Attachments

Presentation

fosdem-2025

Brussels / 1 & 2 February 2025

Self-hosted LLMs at a scale with Paddler

Speakers

Attachments

Links