Brussels / 31 January & 1 February 2026

schedule

Generating Programmable NPUs from Linalg with MLIR and CIRCT


Every new AI workload seems to need new hardware. Companies spend months designing NPUs (neural processing units), then more months building compilers for them—only to discover the hardware doesn't efficiently run their target workloads. By the time they iterate, the algorithm has moved on.

We present a work-in-progress approach that generates NPU hardware directly from algorithm specifications using MLIR and CIRCT. Starting from a computation expressed in MLIR's Linalg dialect, our toolchain automatically generates synthesizable SystemVerilog for custom NPU architectures and hooks it up automatically to a RISC-V control host with an optimized memory hierarchy.

This "algorithm-first" hardware generation inverts the traditional flow: instead of designing hardware then hoping the compiler can use it effectively, we generate hardware that is provably optimal for specific Linalg operations. The approach enables rapid exploration of the hardware/algorithm co-design space: change the algorithm, regenerate the hardware, and immediately see the impact on area, power, and performance. In this talk, we'll demonstrate: * Live generation of NPU RTL from Linalg operations * The MLIR dialect stack that bridges high-level algorithms to CIRCT hardware representations * Performance comparisons between generated hardware and handmade open-source NPUs * Open questions around generalization vs. specialization trade-offs

This work aims to make hardware generation accessible to compiler engineers and algorithm researchers, not just hardware designers. We'll discuss both the potential and limitations of this approach, and where the research needs to go next.

Target audience: Compiler engineers, hardware architects, ML systems researchers. Basic familiarity with MLIR helpful but not required.

Speakers

Josse Van Delm

Links