Brussels / 3 & 4 February 2018


Simulating Multilevel Caches in Cachegrind

Modelling L2 and L3 CPU caches separately in Cachegrind for both x86 and ARM architectures for performance profiling of Isambard HPC system

Systems with large amounts of multi-level cache available require the ability to profile cache performance at each layer, as there remains a significant performance disparity between L2 and L3 caches. An example of such a system and the focus of this work is the new Cavium ThunderX2 supercomputer at the GW4 Isambard project. This paper intends to detail the methodology used to extend the capabilities of the Valgrind tool: Cachegrind. This extension will enable it to model both the L2 and L3 caches separately, as opposed to as a single last level cache. This support will be added for both the ARMv8 architecture, in order to model performance on the Isambard supercomputer, and in addition x86_64, for comparative purposes. As a result of this extension, the Isambard project will be able to optimise frequently used programs, in order to reduce CPU time used and therefore improve throughput on the system.

1) Current state of Valgrind, Cachegrind (possibly state of ARM?) 2) Adding the extra cache layer 3) State of Thunder X2 valgrind/cachegrind


Matthew Coles