FOSDEM 2026
/
Schedule
/
Events
/
Developer rooms
/
Databases
/
How to Prevent Your AI from Returning Garbage: It Starts and Ends with Data Engineering

How to Prevent Your AI from Returning Garbage: It Starts and Ends with Data Engineering

Track: Databases
Room: UB2.252A (Lameere)
Day: Saturday
Start: 18:40
End: 19:00
Video only: ub2252a
Chat: Join the conversation!

Your AI application returns wrong answers. Not because of your LLM choice or vector database—but because of the data engineering ( or lack there of) nobody wants to talk about.

This technical deep dive shows why embedding models, chunking strategies, and search filtering have more impact on AI accuracy than switching from one model to another. Using real production data, we'll demonstrate how naive vector search returns Star Trek reviews when users ask about Star Wars, how poor chunking strategies lose critical context (Who want's their AI to respond to how to fix a headache with a head transplant?), and why "just use a vector" without proper data engineering guarantees hallucinations.

We'll cover:

Embedding model selection: dimensions, token limits, and silent truncation failures
Chunking strategies: when to chunk, how to preserve context, and the double-embedding approach
Hybrid search: combining Full Text/BM25 keyword matching with vector similarity
Filtering architecture: pre-filter vs post-filter performance trade-offs
Production gotchas: triggers, performance, batch processing, and cold start problems

While many of the examples will be for PostgreSQL, This is talk will be database-agnostic, no matter if you are using PostgreSQL, MariaDB, ClickHouse, or others you will learn something! In AI Land, the hard problem is always data engineering, not database selection.

Users don't care about inference speed—they care about accuracy. This talk shows how to engineer your data pipeline so your AI doesn't lie.

Speakers

Matt Yonkovit ( The Yonk )

fosdem-2026

Brussels / 31 January & 1 February 2026

How to Prevent Your AI from Returning Garbage: It Starts and Ends with Data Engineering

Speakers

Links