Brussels / 3 & 4 February 2024


Workflow managers in high-energy physics: enhancing analyses with Snakemake

Workflow management tools have long been used in scientific computing to organise and operate workflows. Many such tools, e.g., Snakemake, Luigi, and Toil, have grown from the foundation of Make (wherein users define simple rules with interdependent inputs and outputs), incorporating additional features to suit increasingly complex user needs. Initially seeing a widespread uptake in bioinformatics, workflow managers have become commonplace in many fields, for example, high-energy physics (HEP).

Analyses in HEP typically consist of many non-trivially related processes with widely varying requirements. Workflow managers can vastly simplify such analyses, providing user-friendly methods to define, review and run analysis workflows. Snakemake has emerged as a leading workflow manager for HEP, with an established user base spread across major experiments. Dialogue between developers and HEP has led to integrations for distributed storage/transfer frameworks, e.g., XRootD, FTP and Amazon S3, and scheduling frameworks, e.g., HTCondor, Slurm, and DRMAA. These integrations enable analysts to better leverage the distributed computing resources made available by experiments, significantly improving the efficiency of HEP analyses. Further collaboration between analysts and developers has seen Snakemake form the core of several standardised analysis frameworks aimed at improving analysis reproducibility such as REANA.

This contribution discusses the current use of workflow managers in HEP, including best practices for their application. Additionally, the anticipated requirements of analysts are considered within the context of ever-increasing data scales in HEP.


Photo of Jamie Gooding Jamie Gooding