Online / 5 & 6 February 2022


Developing an open source license compliance project : our trials, tribulations and achievements

This talk aims at presenting our trials and tribulations as well as our achievements in designing a compliance software project for open source licenses.

"Are all module licenses in our software project compliant with each other ?" Many of our customers have asked us this question even though they already had a plethora of software solutions (not always FOSS software) dealing with this topic. This surprised us, and led us to seek out the cause of their uncertainty. We then discovered that many solutions only look for potential risks and provide reports both too detailed, from the legal POV, for practical use by an engineer, and too technical for practical use by a lawyer.

As engineers are bound to do, we thought there might be a technical solution to this and launched a project. As engineers launching a project are bound to do, we encountered a few hitches and made some discoveries along the way.

Today, here we are to show off the problems we encountered and how we overpassed them, but also to mention that we are open to your contributions (on technical matter or just for suggestions).

The features of the project are mainly conditioned by our clients: - the ability to process a variety of unstructured inputs (zip archives containing code, github or gitlab, dependency manager package lists, and various hypertext links to libraries) ; - the requirement of preserving corporate code confidentiality whether in SaaS or on-premise ; - outputting a very structured and human-readable report listing actual non-compliances and potential ways to solve these non-compliances ; - designing the strategy for integrating our open source software compliance project with CI/CD processes ; - and for the non-client-conditioned feature : our product owner decreed that there could be no false negatives in non-compliance detection.

After a first PoC based on pre-existing code analysis tools (oss review toolkit, licensee, scancode...), we understood that some roadblocks would remain if no improvements were made. It is not about reinventing the wheel, but moving from wood to rubber. So we made a new PoC including Machine Learning and the results are much more promising. We will release it very soon under AGPL v3 license.


Pierre Marty