A Unified I/O Monitoring Framework Using eBPF
- Track: eBPF
- Room: H.1308 (Rolin)
- Day: Saturday
- Start: 14:45
- End: 15:15
- Video only: h1308
- Chat: Join the conversation!
The interoperability of I/O monitoring and profiling tools is very limited due to their strong dependence on the underlying file system (LUSTRE, Spectrum Scale, NFS, etc) and resource managers (batch jobs, VMs, containerized workloads, etc). Widely adopted generic monitoring tools often lack the temporal information of the I/O activity which is often required to understand the I/O behavior of the applications. The increasing diversity of applications and computing platforms demands greater flexibility and scope in I/O characterization. This talk proposes a framework for monitoring I/O activity using extended Berkley Packet Filter (eBPF) technology which has gained much traction in observability and cloud-native landscape. By tracing the kernel’s Virtual File System (VFS) functions with eBPF, it is possible to monitor the I/O activity on different types of platforms like HPC, cloud hypervisors or Kubernetes. By storing the metrics traced by eBPF programs in a high performance time series database like Prometheus, it is possible to perform system-wide monitoring of computing platforms that use different types of local or remote file systems in a unified manner. The current talk presents the basics of eBPF and discusses the framework that is used to monitor I/O activity in a file system and application agnostic way. It also presents the experimental results of quantifying the overhead and accuracy of the proposed framework using IOR benchmark results as the reference. The results indicate that there is negligible overhead in using the framework and bandwidths reported by the proposed methodology are in a very good agreement with the ones from IOR tests. Finally, results from a production HPC platform that uses the proposed framework to monitor I/O activity on the LUSTRE file system are presented.
Speakers
| Mahendra Paipuri |