Demographics of Linux kernel developers: how old are they?

Today I’m contributing to SOS-Evol 2013 with the talk “Demographics of Linux kernel developers: how old are they?”, which presents a work in progress oriented towards understanding how the different “generations” of Linux kernel developers are evolving over time.

Presentation: Demographics of Linux kernel developers
Presentation: Demographics of Linux kernel developers (slides)

It presents our work in progress about the characterization of the “age” of Linux kernel developers, considering “age” as “time in the project”. Although the work is still not over, and results could vary, current conclusions could be worrisome: new generations are smaller and smaller, and in general all generations are now much smaller than they were five or six years ago.

The study is based in the analysis of commits to the git repository, which means that we rely on git information, which is available only since 2005, when the project started to use it. The methodology starts by identifying unique developers (some of them are using several identities over the life of the project: we unify them, since we want to identify persons). Then, we consider “age” as time in the project. Therefore, somebody who entered (committed to) the project five years ago would be “five years old”. With this information, we produce demographic pyramids, with each “generation” being represented by a bar, with size equal to the number of developers in that generation. Generations, in this case, are three months long.

Demographics pyramid for October 2009
Demographics pyramid for October 2009

For example, in this pyramid for October 2009, you can see how at that time, those “18 quarters old” were almost 300 developers, while those coming to the project in the last quarter (July-September 2009) were like 150 developers.

First thing to notice is how different is the “older generation”, which for all the pyramids is the largest one. This is because it includes all people working in Linux before 2005, when we start to have data from the git repository. Remember that when the Linux git repository started, the project was already many years old, with many active developers. All of them that were active during the first quarter of the life of the git repo are included in this older generation.

The evolution of the pyramids can be appreciated in this faceted chart, with pyramids for six consecutive years.

Pyramids for six years
Pyramids for six years

By analyzing those charts, it can be seen (with the same scale) how the pyramids evolve over time. Several effects are apparent:

  • Generations are smaller and smaller from about 100-150 to 30-50 per quarter
  • Older generations are disappearing
  • Last generations quite smaller now than they were six years ago

As I said, this is still work in progress, so we could have errors. And we also don’t know if these facts are due to the Linux kernel project being less attractive and retentive to developers, or to any other cause (such as changes in policies or in practices in the project). But all in all, the results were a bit of surprising to me when I first saw the results.

More details in the presentation itself, and of course by asking 😉 And if you found this interesting, stay tuned, we plan to produce a more detailed and finished report about all of this in some weeks.

[Datasets and scripts for producing the results are available upon request, and partially already a part of the MetricsGrimoire and vizGrimoire toolsets]

Leave a Reply

Up ↑

Discover more from The Software Development Analytics Blog

Subscribe now to keep reading and get access to the full archive.

Continue reading