Purely Functional GPU Programming with Futhark
We present the pure functional array language, Futhark, along with its optimising GPU-targeting compiler. Of particular focus are the language tradeoffs necessary to ensure the ability to efficiently generate high-performance GPU code from a high-level parallel language. We also demonstrate (nested) data-parallel array programming, a programming paradigm that enables concise programming of massively parallel systems. We show how Futhark code can be easily integrated with larger applications written in other language. Finally, we report benchmarks showing that Futhark is able to match the performance of hand-written code on various published benchmarks.
GPUs and other massively parallel systems are now common, yet programming them is often a painful experience. Languages are often low-level and fragile, with careful hand-optimisation necessary to obtain good performance. The programmer is often forced to write highly coupled code with little modularity. The high-level languages that exist, often functional in nature, are often insufficiently flexible, or poor performes in practice. We present our work on a programming language that seeks a common ground between imperative and functional approaches.
Futhark is a small programming language designed to be compiled to efficient GPU code. It is a statically typed, data-parallel, and purely functional array language, and comes with a heavily optimising ahead-of-time compiler that generates GPU code via OpenCL. Futhark is not designed for graphics programming, but instead uses the compute power of the GPU to accelerate data-parallel array computations. We support regular nested data-parallelism, as well as a form of imperative-style in-place modification of arrays, while still preserving the purity of the language via the use of a uniqueness type system.
The Futhark language and compiler is an ongoing research project. It can compile nontrivial programs which then run on real GPUs at high speed. The Futhark compiler employs a set of optimisations (fusion, flattening, distribution, tiling, etc) to shield the programmer from having to know the details of the underlying hardware. The Futhark language itself is still very spartan - due to the basic design criteria requiring the ability to generate high-performance GPU code, it takes more effort to support language features that are common in languages with more forgiving compilation targets. Nevertheless, Futhark can already be used for nontrivial programs, and has been used to port several real-world benchmark applications, with performance comparable to original hand-written GPU (OpenCL or CUDA) code.
Futhark is not intended to replace existing general-purpose languages. Our intended use case is that Futhark is only used for relatively small but compute-intensive parts of an application. The Futhark compiler generates code that can be easily integrated with non-Futhark code. For example, you can compile a Futhark program to a Python module that internally uses PyOpenCL to execute code on the GPU, yet looks like an ordinary Python module from the outside.