Brussels / 4 & 5 February 2023


Parsing binary formats with Kaitai Struct

Kaitai Struct is a tool for parsing binary formats. Binary formats, such as archive files, executables, filesystems, multimedia files, network protocols, etc. are everywhere. If your application needs to read data in a specific binary format, you need a parser that unpacks the bytes into meaningful data structures that you can work with. There are libraries doing that for popular formats, but what if there is no suitable library in your programming language for the format you need?

Kaitai Struct has got you covered: it introduces a declarative domain-specific language (based on YAML) for describing the structure of arbitrary binary formats. Format specifications in this language are consumed by a compiler, which generates ready-to-use parsing modules in 11 programming languages (C++, C#, Go, Java, JavaScript, Lua, Nim, Perl, PHP, Python, Ruby). There are more than 180 format specifications in the format gallery and hundreds more in various GitHub projects.

This talk will discuss the current state, capabilities and limitations of Kaitai Struct. It will also focus on serialization: a highly requested feature that is being actively worked on. Currently, Kaitai Struct can only parse (read) existing binary files created by other applications. Serialization allows to edit the data of an existing file and write it back or create a new file from scratch, greatly expanding the use of all written format specifications.


Photo of Petr Pucil Petr Pucil