Brussels / 1 & 2 February 2025

schedule

OpenCL, CUDA, and HIP as compilation targets for functional array programs


OpenCL, CUDA, and HIP are possibly the most popular APIs for low-level GPU programming, and most GPUs support more than one. A lot of superstitition abounds about their relative performance compared to each other, but little data is available, largely because it is very tedious to implement otherwise-equivalent programs using these APIs, in order to compare their performance.

In this presentation I will present my experiences using OpenCL, CUDA, and HIP as compilation targets for Futhark, a functional array language. I look at the performance of OpenCL versus CUDA, and OpenCL versus HIP, on the code generated by the Futhark compiler on a collection of 48 application benchmarks on two different GPUs - probably the largest such comparison done, at least in terms of benchmarks. Despite the generated code in most cases being equivalent, I observe significant performance differences on the same hardware. I can identify the root causes of most of these differences, many of which are due to relatively superficial details such as inconsistent defaults regarding compiler optimisation and numerical accuracy, although a few remain mysterious. The obtained information is useful to anyone who seeks to generate low-level GPU code from higher level specifications or libraries.

Speakers

Photo of Troels Henriksen Troels Henriksen