OpenCL, CUDA, and HIP as compilation targets for functional array programs
- Track: HPC, Big Data & Data Science
- Room: UB5.132
- Day: Sunday
- Start: 14:00
- End: 14:10
- Video only: ub5132
- Chat: Join the conversation!
OpenCL, CUDA, and HIP are possibly the most popular APIs for low-level GPU programming, and most GPUs support more than one. A lot of superstitition abounds about their relative performance compared to each other, but little data is available, largely because it is very tedious to implement otherwise-equivalent programs using these APIs, in order to compare their performance.
In this presentation I will present my experiences using OpenCL, CUDA, and HIP as compilation targets for Futhark, a functional array language. I look at the performance of OpenCL versus CUDA, and OpenCL versus HIP, on the code generated by the Futhark compiler on a collection of 48 application benchmarks on two different GPUs - probably the largest such comparison done, at least in terms of benchmarks. Despite the generated code in most cases being equivalent, I observe significant performance differences on the same hardware. I can identify the root causes of most of these differences, many of which are due to relatively superficial details such as inconsistent defaults regarding compiler optimisation and numerical accuracy, although a few remain mysterious. The obtained information is useful to anyone who seeks to generate low-level GPU code from higher level specifications or libraries.
Speakers
Troels Henriksen |