Brussels / 3 & 4 February 2024


RDF Dataset Canonicalization: scalable security for Linked Data

RDF is the technology that powers many of the world's biggest datasets, serving as the backbone of the Linked Data ecosystem: over half of all websites serve RDF data, and massive datasets on topics such agriculture and transport are available openly from national governments and EU institutions. However, with several different RDF serialization formats in use, the inherent complexity of graph database topology and the ever-present risk of naming collisions on the Semantic Web, you wouldn't be alone in wondering how to tackle RDF data security effectively at scale.

The W3C's RDF Canonicalization and Hash Working Group has been meeting since 2022 to answer this very question, by designing and standardizing a process to transform datasets into a single, canonical form. This talk will take you on a tour of the result of this work, the RDF Dataset Canonicalization algorithm:

  • learn how canonicalization makes protecting datasets from malicious interference easier, from ensuring basic integrity to using RDF in digital credentials

  • get an insight into the challenges posed by 'poison graphs' and how to guard against them in your data pipelines

  • see what it looks like to run the algorithm on the command line and in your scripts using a 100% FOSS technology stack

  • discover how the W3C's standardization process works and how to participate in the design of RDF specifications


Photo of Sebastian Crane Sebastian Crane