Brussels / 31 January & 1 February 2026

schedule

Multi-Petabyte Data Distribution in Industry & Science with CernVM File System


The CernVM File System (CVMFS) is a scalable, high-performance distributed filesystem developed at CERN to efficiently deliver software and static data across global computing infrastructures, primarily designed for high-energy physics (HEP). For the Large Hadron Collider (LHC) only, CVMFS is serving around 4 billion files (~2PB of data). CVMFS uses a content-addressable storage model, where files are stored in the form of cryptographic hashes, ensuring integrity and enabling deduplication. It follows a multi-caching architecture where the data are published in a single source of truth (Stratum 0), mirrored by a network of distributed servers (Stratum 1), and propagated to the clients via forward proxies. This multi-layer of caching allows for a cost-effective alternative to traditional file systems, where clients are offered reliable access to versioned read-only datasets with low overhead. In this talk we will focus on how CVMFS interoperates with the highly adopted S3 storage, providing a conventional POSIX filesystem view of the objects, using the available metadata for efficient exploitation of the medium. We will also highlight the benefit of using CVMFS with containerized workflows and demonstrate tools developed to facilitate data publishing.

Homepage: https://cernvm.web.cern.ch/fs/
Documentation: https://cvmfs.readthedocs.io/
Development: https://github.com/cvmfs/cvmfs/
Forum: https://cernvm-forum.cern.ch/

Speakers

Photo of Andriy Utkin Andriy Utkin

Links