Brussels / 4 & 5 February 2017

schedule

Analyze terabytes of OS code with one query

How to leverage the code shared on GitHub with ease


Google has made available a BigQuery copy of most open source code shared in GitHub. This allows any interested party to analyze 5 years of GitHub metadata and more than 42 terabytes of code easily. In this session we'll cover how to leverage this data - to understand the community around any language or project. With this, design requests and decisions can be made looking at the actual patterns discoverable through analytical methods.

Google has made available a BigQuery copy of most open source code shared in GitHub. This allows any interested party to analyze 5 years of GitHub metadata and more than 42 terabytes of code easily. In this session we'll cover how to leverage this data - to understand the community around any language or project. With this, design requests and decisions can be made looking at the actual patterns discoverable through analytical methods.

During a lighting talk we can quickly see:

  • How is this run.
  • How coding patterns have changed through time.
  • Guiding your project design decisions based on actual usage of your APIs.
  • How to request features based on data.
  • The most effective phrasing to request changes.
  • Effects of social media on a project's popularity.
  • Who starred your project - and what other projects interest them.
  • Measuring community health.
  • Running static code analysis at scale.

Speakers

Felipe Hoffa

Links