The Conference for Machine Learning Innovation

How to Identify Collaborators in large Codebases using Unsupervised Learning

Join the ML Revolution! Register till August 22: ✓ML Intro Day for free ✓Save more than €500 ✓ 10% Team Discount Register Now

The way developers collaborate inside and particularly across teams often escapes management’s attention, despite a formal organization with designated teams being defined. Observability of the actual, organically formed engineering structure would provide decision makers additional tools to manage their talent pool. What is the best engineering team capable of migrating this part of the stack from language X to language Y? What are the most efficient funnels of coding collaborations? On which developers your codebase is relying on? During this talk, not only we aim to identify existing inter- and intra-team interactions but also suggest relevant opportunities for suitable collaborations. To do so, we will rely on contributors’ commit activity, usage of programming languages, and code identifier topics by embedding and clustering them. We will evaluate our approach analyzing codebases of several open source companies. The findings will show that only looking at a codebase, we are able to restore the engineering organization behind, and also reveal hidden coding collaborations as well as justify in-house technical decisions.

Behind the Tracks