Brussels / 1 & 2 February 2020

schedule

Buildtest: HPC Software Stack Testing Framework


HPC support teams are often tasked with installing scientific software for their user community and the complexity of managing a large software stack gets very challenging. Software installation brings forth many challenges that requires a team of domain expertise and countless hours troubleshooting to build an optimal software state that is tuned to the architecture. In the past decade, two software build tools (Easybuild, Spack) have emerged that are widely accepted in HPC community to accelerate building a complete software stack for HPC systems. The support team are constantly involved in fulfilling software request for end-users which leads to an ever-growing software ecosystem. Once a software is installed, the support team hands it off to the user without any testing because scientific software requires domain expertise in order to test software. Some software packages are shipped with a test suite that can be run at post build while many software have no mechanism for testing. This poses a knowledge gap between HPC support team and end-users on the type of testing to do. Some HPC centers may have developed in-house test scripts that are suitable for testing their software, but these tests are not portable due to hardcoded paths and are often site dependent. In addition, there is no collaboration between HPC sites in building a test repository that will benefit the community. In this talk I will presents buildtest, a framework to automate software testing for a software stack along with several module operations that would be of interest to the HPC support team.

HPC computing environment is a tightly coupled system that includes a cluster of nodes and accelerators interconnected with a high-speed interconnect, a parallel filesystem,multiple storage tiers, a batch scheduler for users to submit jobs to the cluster and a software stack for users to run their workflows. A software stack is a collection of compilers, MPI, libraries, system utilities and scientific packages typically installed in a parallel filesystem. A module tool like environment-modules or Lmod is generally used for loading the software environment into the users’ shell environment.

Software are packaged in various forms that determine how they are installed. A few package formats are: binary, Makefile, CMake, Autoconf, github, PyPi, Conda, RPM,tarball, rubygem, MakeCp, jar, and many more. With many packaging formats, this creates a burden for HPC support team to learn how to build software since each one has a unique build process. Software build tools like Easybuild and Spack can build up to 1000+ software packages by supporting many packaging formats to address all sorts of software builds. Easybuild and Spack provide end-end software build automation that helps HPC site to build a very large software stack with many combinatorial software configurations. During the installation, some packages will provide a test harness that can be executed via Easybuild or Spack which typically invokes a make test or ctest for packages that follow ConfigureMake, Autoconf, or CMake install process.

Many HPC sites rely on their users for testing the software stack, and some sites may develop in-house test scripts to run sanity check for popular scientific tools. Despite these efforts, there is little or no collaboration between HPC sites on sharing tests because they are site-specific and often provide no documentation. For many sites, the HPC support team don’t have the time for conducting software stack testing because: (1) lack of domain expertise and understaffed, (2) no standard testsuite and framework to automate test build and execution. Frankly, HPC support teams are so busy with important day-day operation and engineering projects that software testing is either neglected or left to end-users. This demands for a concerted effort by HPC community to build a strong open-source community around software stack testing.

There are two points that need to be addressed. First, we need a framework to do automatic testing of installed software stack. Second, is to build a test repository for scientific software that is community driven and reusable amongst the HPC community. An automated test framework is a harness for automating the test creation process, but it requires a community contribution to accumulate this repository on per-package basis. Before we dive in, this talk will focus on conducting sanity check of the software stack so tests will need to be generic with simple examples that can be compiled easily. In future, buildtest will focus on domain-specific tests once there is a strong community behind this project.

Speakers

Photo of Shahzeb Siddiqui Shahzeb Siddiqui

Links