CRA-ppy data: We need better open data for CRA compliance
- Track: CRA in practice
- Room: UA2.114 (Baudoux)
- Day: Saturday
- Start: 18:15
- End: 18:30
- Video only: ua2114
- Chat: Join the conversation!
Everyone's building CRA compliance tooling: SBOM generators, vulnerability scanners, security scorecards, automated due diligence checks. But, CRA readiness isn't just about tooling. It's about ensuring the data feeding those tools is actually accurate and trusted. The project activity, package metadata, licensing information, and vulnerability data these tools depend on is systematically unreliable, and we need to fix it at the source.
This talk demonstrates why data accuracy is the blocking issue for practical CRA readiness. We'll show real-world examples from major package ecosystems: Python packages with wrong license declarations, Java JARs with embedded vulnerable dependencies that scanners miss, Rust crates with incomplete origin metadata. When demonstrating due diligence or attempting automated vulnerability reporting, the underlying data failures make compliance impossible, no matter how good your tools are.
The good news is that this is solvable, and the FOSS community is already working on it!
We'll present concrete approaches being deployed across ecosystems: systematic metadata curation projects that scan and fix package data at scale, validation tooling that catches errors before publication, and community infrastructure that makes accurate software metadata freely available. You'll see how projects like Maven Heaven, T-Rust, and Nixpkgs Clarity are cleaning up metadata for the most popular packages, releasing curated data under open licenses, and providing author-facing tools to prevent bad data from entering registries. And we'll discuss how reliable project health data provides critical insights for proactive CRA due diligence and risk management.
This session gives you practical next steps: how to audit data quality in your dependencies, contribute to metadata curation efforts, integrate validation into your publishing workflow, and leverage community-curated data for more reliable compliance automation.
Speakers
| Georg Link | |
| Thomas Steenbergen |