Turnover is inevitable. Developers leave a project and others join it. And this effect may be more harmful in open source communities than in companies. Depending on the community, it is hard to find new people willing to participate. And even more, there is a knowledge gap left by those that gave up developing. So the issue is double: people leave and those leave a knowledge gap that in some cases is hard to fill.
However, is it possible to analyze that regeneration of developers? How good is my community retaining developers? Is it possible to measure the number of newcomers joining the community? It is clear that having this type of information is basic to define policies to attract new members, retain current ones and check if the current situation is driving the community to good terms.
This post is an example of the type of things that in Bitergia we are building on top of the CVSAnalY tool. In previous posts we introduced the concept of commit, its peculiarities as a metric, and several ways to calculate this, adding filters such as bots, merges or branches.
The demographics of open source communities allows us to understand how the community has evolved, and potentially how this community will evolve through the time. Demographics in open source communities can be seen as the typical analysis of pyramids of population in countries or cities. Typically on the top of the chart the oldest people are found, while the age decreases going to the bottom of the chart. Those are named as pyramids given their typical triangle shape. However during the last decades and in developed countries, this shape is moving to an inverted pyramid, although this is another discussion :).
Thanks to the study of the demographics of developers, it is possible to know a bit more about the community. We already introduced the demographics of the Linux Kernel, and this post is focus on the analysis of the OpenStack community as a case study. The following figure shows the demographics of the OpenStack community (daily updated in the OpenStack activity dashboard). The x-axis indicates the number of developers, while the y-axis shows the timeframe of activity.
Green bars show the number of developers that in each of the periods started contributing with at least one commit. And blue bars show the number of those developers that still contribute to the community. By definition, a developer is still contributing to the community if a commit has been detected during the last six months. If not, this developer is considered as a developer that left the community. There may raise the case when a developer after more than six months, returns and submit another change to the source code. In this specific context, this developer would appear as not leaving the community.
So, there are people joining the community (green bars) and people still working in the community with some age (blue bars). The retention rate of a given cohort is the number of people remaining out of the total number of people that at that time joined the community. In the example shown above, the retention rate for those developers that joined the community two years ago is around a 24%. In numbers, from the 210 different developers that were detected as committing their first change to the source code, 52 of them are still contributing to the gits.
The following figure shows the current retention rate in the OpenStack ecosystem and focused on the git activity. It is interesting to notice specific things. As a first example, and being OpenStack a project with a lot of activity and interest from the industry. A 50% developers that at some point committed in the last six months, have not committed anymore. And this is how the community retention behaves at the time of this analysis. One week ago or in a week, these numbers may be slightly different. On the other hand, we find some stabilization at the end of the chart. For those developers between 1 year and 2 years of experience, between a 20% and a 30% of them keep contributing. And finally, for those developers that started contributing more than 2 years ago, between a 10% and a 20% of them still submit changes to the source code nowadays.
Having this type of information is probably crucial for community managers, companies and public administrations willing to invest resources in specific open source projects. Understanding the structure and demographics of the community is of key importance and allow to check the current status of the community, how this has evolved till nowadays and even to foresee (in somehow) how your community will be in a few years. Maybe you’re gaining a lot of attraction nowadays, but only a portion of those developers will keep contributing in some years.
Although it is not detailed here, it is possible to have the evolution of the retention rate of specific years. Let’s say that there is a project with more than 10 years of history and we may have the question: how good are we retaining developers with more than 2 years of experience along the project?. Well, having the source code database with Metrics Grimoire and CVSAnalY, it is possible to compare several timeframes of any project and calculate its retention rate for each timeframe for those developers with two years of experience. This type of analysis is basic in order to develop specific policies to attract and what is probably more important, retain your developers and check if those policies are really and properly working.
Regarding to the methodology, it is worth mentioning that this analysis bases its results in the unique identities matching. This process is semi automated and manually improved. Although databases are carefully treated, minor issues may appear. If you find any incongruence of the dataset or the results here presented, please let us know.