DuckDB in the Cloud: A Simple, Powerful SQL Engine for Your Lakehouse
- Track: Databases
- Room: UB2.252A (Lameere)
- Day: Saturday
- Start: 13:15
- End: 13:20
- Video only: ub2252a
- Chat: Join the conversation!
DuckDB has traditionally been seen as a last-mile analytics powerhouse, the fastest way to run a SQL query on your laptop. But DuckDB offers more than just fast SQL, of course; it supports full database semantics and ACID transactions, behaving like a fully fledged, in-process OLAP database. The in-process component has sometimes been viewed as a limitation when considering DuckDB as a data warehouse.
However, DuckDB now supports reading and writing to most Open Table Formats (OTFs), including Iceberg, Delta, and DuckLake. This capability puts DuckDB in a very different position: it allows DuckDB to act as a SQL engine in the cloud (or on your local machine) and run queries against any OTF stored in remote cloud storage. DuckDB can now be the all-mighty, single-node query engine that powers your data analytics use cases.
In this talk I will also dive into: - Why this change allows for a "multi-player" DuckDB experience. - How DuckDB efficiently queries very large tables leveraging table statistics and cache. - Why building native implementations with minimal dependencies to interact with OTFs is hard but can potentially pay off.
Projects: https://github.com/duckdb/duckdb https://github.com/duckdb/ducklake https://github.com/duckdb/duckdb-iceberg https://github.com/duckdb/duckdb-delta
Speakers
| Gábor Szárnyas | |
| Guillermo Sanchez |