Brussels / 31 January & 1 February 2026

schedule

Document interopability and conversion: it shouldn’t be that hard!


This talk is presented by Stephan Meijer (NL government, NLdoc/La Suite Docs) and Albert Krewinkel, maintainer of Pandoc.

Public administrations hold millions of documents trapped in formats that are hard to reuse and often fail WCAG requirements: PDFs, legacy Word templates, ad-hoc styles. At Logius, with the NLdoc project, we were tasked with turning those documents into accessible, reusable HTML and other open formats. Our first instinct was the obvious one: use Pandoc and wrap it with some pre- and post-processing. It worked… until it didn’t. Every new edge case, every new target editor, every new accessibility rule meant more custom glue code and brittle filters.

So we flipped the problem: instead of chaining converters, we designed a JSON-based document Abstract Syntax Tree (AST) with an OpenAPI specification and built dedicated conversion services around it. That AST now sits at the centre of a small ecosystem: PDFs and DOCX files are converted into the AST, and from there into editors such as Tiptap and BlockNote, or directly into formats such as HTML. Support for ODT, Markdown and EPUB is on the way.

The same AST also powers the NLdoc Tiptap-based editor, where authors get real-time accessibility validation and can export to accessible formats. It also powers the import functionality in La Suite (Docs), the FR–DE–NL sovereign collaboration stack.

In this talk we’ll walk through that journey: why "just use Pandoc" wasn’t enough, what our AST looks like, how we wired it into a queue-based microservice architecture, and how this approach turns document conversion from a one-off migration hack into an interoperability layer for accessible, sovereign collaboration tools.

Recent versions of the document specification are available at the Releases page of its repository.

Speakers

Photo of Stephan Meijer Stephan Meijer
Photo of Albert Krewinkel Albert Krewinkel

Links