Partly Cloudy with a Chance of Zarr: A Virtualized Approach to Zarr Stores from ECMWF Fields Database
- Track: HPC, Big Data & Data Science
- Room: H.1308 (Rolin)
- Day: Sunday
- Start: 12:00
- End: 12:25
- Video only: h1308
- Chat: Join the conversation!
ECMWF manages petabytes of meteorological data critical for weather and climate research. But traditional storage formats pose challenges for machine learning, big-data analytics, and on-demand workflows.
We propose a solution which introduces a Zarr store implementation for creating virtual views of ECMWF’s Fields Database (FDB), enabling users to access GRIB data as if it were a native Zarr dataset. Unlike existing approaches such as VirtualiZarr or Kerchunk, our solution leverages the domain-specific MARS language to define virtual Zarr v3 stores directly from scientific requests, bridging GRIB and Zarr for dynamic, cloud-native access.
This work is developed as part of the WarmWorld Easier project, aiming to make climate and weather data more interoperable and accessible for the scientific community. By combining the efficiency of FDB with the flexibility of Zarr, we unlock new possibilities for HPC, big-data analytics, and machine learning pipelines.
In this talk, we will explore the architecture, discuss performance considerations, and demonstrate how virtual Zarr views accelerate integration in open-source workflows.
This session will: - Explain the motivation behind creating virtual Zarr views of ECMWF’s Fields Database. - Detail the design and implementation of a custom Zarr Store that translates Zarr access patterns into MARS requests. - Discuss performance trade-offs and scalability in HPC contexts. - Showcase real-world examples of how this approach may support data science workflows, machine learning, and distributed computing.
Speakers
| Tobias Kremer |