Scaling Gmsh-based FEM on LUMI: Efficiently Handling Thousands of Partitions
- Track: HPC, Big Data & Data Science
- Room: H.1308 (Rolin)
- Day: Sunday
- Start: 10:30
- End: 10:55
- Video only: h1308
- Chat: Join the conversation!
Content
High-frequency wave simulations in 3D (with e.g. Finite Elements) involve systems with hundreds of millions unknowns (up to 600M in our runs), prompting the use of massively parallel algorithms. In the harmonic regime, we favor Domain Decomposition Methods (DDMs) where local problems are solved in smaller regions (subdomains) and the full solution of the PDE is recovered iteratively. This requires each rank to own a portion of the mesh and to have a view on neighboring partitions (ghost cells or overlaps). In particular, the Optimized Restricted Additive Schwarz algorithm requires assembling matrices at the boundary of overlaps, which requires creating additional elements after the partitioning.
During the last two years, I pushed our in-house FEM code (GmshFEM) to run increasingly large jobs, from 8 MPI ranks on a laptop, through local and national clusters, up to more than 30,000 ranks on LUMI. Each milestone provided its own challenges in the parallel implementation: as the problem size increases, simple global reductions can go from being a minor synchronization to being a major bottleneck, redundant information in partitioned meshes can eat hundreds of gigabytes of RAM, and load-balancing issues can become dominant.
In this talk, I will describe how we tackled these challenges and how the future versions of Gmsh will take into account these issues. In particular, the next version of the MSH file format will be optimized to reduce data duplication across subdomains. I will also present the new API for querying information about partitioned meshes, such as retrieving elements in overlapping regions.
About Gmsh
Gmsh (https://gmsh.info/) is an open-source (GPL-2) finite element mesh generator widely used in scientific and engineering applications. It provides a graphical interface, a scripting language for automation, and language bindings (C/C++, Fortran, Python, Julia). In this work, Gmsh serves as the front-end mesh generator for large-scale distributed FEM simulations using our in-house solver GmshFEM (https://gitlab.onelab.info/gmsh/fem).
Speakers
| Boris Martin |