Libgomp Optimizations for Scheduler Guided OpenMP Execution in Cloud VMs
- Track: GCC (GNU Toolchain)
- Room: UD6.215
- Day: Saturday
- Start: 13:05
- End: 13:30
- Video only: ud6215
- Chat: Join the conversation!
OpenMP is a widely used framework for parallelizing applications, enabling thread-level parallelism via simple source-code annotations. It follows the fork-join model and relies heavily on barrier synchronization among worker threads. Running OpenMP- enabled applications in the cloud is increasingly popular due to elasticity, fast startup, and pay-as-you-go pricing.
In cloud-based execution, worker threads run inside a virtual machine (VM) and are subject to dual levels of scheduling: threads are placed on guest virtual CPUs (vCPUs), and vCPUs run as ordinary tasks on the host’s physical CPUs (pCPUs). The guest scheduler places threads on vCPUs, while the host scheduler places vCPUs on pCPUs. Because these schedulers act independently, a semantic gap emerges that can undermine application performance. Barrier synchronization, whose efficiency depends on timely scheduling decisions, is vulnerable to this semantic gap, and remains under-explored.
This talk presents my PhD thesis project supervised by Julia LAWALL and Jean-Pierre Lozi at Inria Paris. The thesis defines Phantom vCPUs to describe problematic host-level preemptions in which guest vCPUs remain queued on busy pCPUs, stalling progress. We show that OpenMP performance can be substantially improved inside oversubscribed cloud VMs by (1) dynamically adapting the degree of parallelism (DoP) at the start of each parallel region and (2) dynamically choosing between spinning versus blocking at barriers on a per-thread, per-barrier basis. We propose paravirtualized, scheduler-informed techniques that accurately guide these decisions and demonstrate their effectiveness in realistic deployments.
The first contribution of this thesis is Phantom Tracker, an algorithmic solution implemented in the Linux kernel that leverages paravirtualized task scheduling to detect and quantify Phantom vCPUs accurately. the second contribution is pv-barrier-sync, a dynamic barrier synchronization mechanism driven by the scheduler insights produced by Phantom Tracker. The third and final contribution of is Juunansei, an OpenMP runtime extension that demonstrates the practical utility of Phantom Tracker and pv-barrier-sync with additional optimizations.
The talk discusses the context and motivation of this work, followed by a brief introduction of the Phantom Tracker, and then takes a deep dive into the libgomp implementation of pv-barrier-sync and Juunansei.
Speakers
| Himadri CHHAYA-SHAILESH |