Brussels / 3 & 4 February 2024


Feeding ML models with the data from the databases in real-time

In today's fast-paced business environment, and especially with the advent of machine learning (ML), organizations are seeking ways to derive better insights from their data as quickly as possible. However, implementing a complete ML pipeline can be quite challenging. It’s even harder if you want to process newly arrived data immediately or you have a legacy system which is not easy to connect with your modern infrastructure . Change Data Capture (CDC) has emerged as a technology for delivering real-time data changes from various sources, especially from the databases. In this talk we will introduce Debezium, a leading open source framework for CDC. We will discuss how it can be leveraged for ingesting data from the various databases into ML frameworks like TensorFlow and what the pitfalls are if you go this route.

Attendees will gain an understanding of how Debezium CDC works, how it can help them to ingest data from the source database into the ML framework in real time and also what are the possible challenges with this approach.


Vojtech Juranek