Brussels / 31 January & 1 February 2026

schedule

Federating Databases with Apache DataFusion: Open Query Planning and Arrow-Native Interoperability


Apache DataFusion is emerging as a powerful open-source foundation for building interoperable data systems, thanks to its strongly modular design, Arrow-native execution model, and growing ecosystem of extension libraries. In this talk, we'll explore our contributions to the DataFusion ecosystem—most notably DataFusion Federation for cross-database query execution and DataFusion Table Providers that connect DataFusion to a wide range of backends.

We'll show how we use these components to federate queries to databases such as TiDB and InfluxDB 2, and how this fits into a broader data fabric/API generation work we're doing at Twintag. We'll also discuss our work on Arrow-native interfaces, including an Arrow Flight SQL Server implementation for DataFusion and a prototype Flight SQL endpoint for TiDB, which together enable a fully Arrow-based pipeline spanning query submission, execution, and federated dispatch.

The session highlights practical patterns for building distributed data infrastructure using open libraries rather than monolithic systems, and offers a look at where Arrow and DataFusion are headed as shared interoperability layers for modern databases.

Speakers

Michiel De Backker
Ghasan Mohammad (hozan23)

Links