Federating Databases with Apache DataFusion: Open Query Planning and Arrow-Native Interoperability
- Track: Databases
- Room: UB2.252A (Lameere)
- Day: Saturday
- Start: 17:50
- End: 18:10
- Video only: ub2252a
- Chat: Join the conversation!
Apache DataFusion is emerging as a powerful open-source foundation for building interoperable data systems, thanks to its strongly modular design, Arrow-native execution model, and growing ecosystem of extension libraries. In this talk, we'll explore our contributions to the DataFusion ecosystem—most notably DataFusion Federation for cross-database query execution and DataFusion Table Providers that connect DataFusion to a wide range of backends.
We'll show how we use these components to federate queries to databases such as TiDB and InfluxDB 2, and how this fits into a broader data fabric/API generation work we're doing at Twintag. We'll also discuss our work on Arrow-native interfaces, including an Arrow Flight SQL Server implementation for DataFusion and a prototype Flight SQL endpoint for TiDB, which together enable a fully Arrow-based pipeline spanning query submission, execution, and federated dispatch.
The session highlights practical patterns for building distributed data infrastructure using open libraries rather than monolithic systems, and offers a look at where Arrow and DataFusion are headed as shared interoperability layers for modern databases.
Speakers
| Michiel De Backker | |
| Ghasan Mohammad (hozan23) |