Brussels / 3 & 4 February 2018


Tools for large-scale collection and analysis of source code repositories

Open source Git repository collection pipeline

There are 10s of millions Git repositories publicly available over the Internet, but what kind of tools would one need to be able to treat all this code as a Big Dataset? I will talk about new and existing OSS tools that were built and used, in order to allow collection and analysis of millions of Git repositories on commodity hardware clusters.


Alexander Bezzubov