Macros Gone Wild: The Usage of the C Preprocessor in the Linux Kernel
- Track: Kernel
- Room: UD2.208 (Decroly)
- Day: Sunday
- Start: 09:30
- End: 10:00
- Video only: ud2208
- Chat: Join the conversation!
The Linux kernel is a foundational element of the modern IT infrastructure, powering billions of devices from smartphones and routers to desktops and supercomputers. Consequently, the quality of its code affects IT's reliability, resilience, performance, and evolution. Through this presentation I aim to improve our understanding of the Linux kernel's code quality regarding its usage of the C preprocessor: a dated, ubiquitous, and often insidious part of the compilation process. I show the characteristics of the C preprocessor's usage, the introduced technical debt, the usage's evolution, and the feasibility of reducing the incurred technical debt, mainly through refactoring and by utilizing facilities of the Rust programming language.
As a tool for the analysis I extended the CScout refactoring browser to collect tens of metrics before and after the C preprocessor's execution. I then applied CScout on three versions of Linux spanning two decades, running the oldest kernel analysis on the QEMU emulator and the newest on the ARIS supercomputer, processing in total more than 45 million lines of code.
I found that the C preprocessor is extensively used, often doubling key code elements seen by the C compiler-proper. The preprocessor's use is associated with several types of technical debt including namespace pollution; scoping, namespace, and control-flow confusion; composite identifiers; hybrid call paths; deep and cyclic include hierarchies; expansion explosions; and structural quality metrics deterioration. Although the density of some preprocessor usages is steady or decreasing over time, some worrisome usages are showing significant increases.
To explore how this situation can be addressed, I present a taxonomy of non-trivial C preprocessor use cases, which indicates that there two broad categories of changes that can be made. First, most object-like macros can be easily refactored into C const
objects or enum
values, while about half of the function-like macros could be rewritten as C functions. Although this change is wide in scope the corresponding gains in code maintainability will be small. Second, the remaining non-trivial C preprocessor function-like macros could be refactored; many through specific facilities of Rust that I describe for corresponding macro use cases. In this case however the amount of required engineering effort is large.
Speakers
Diomidis Spinellis |