[Update: We have finally published the complete report about OpenStack Folsom. Although the details mentioned here are still relevant, the numbers are much more complete and accurate in that final report.]
Since several months ago, OpenStack is one of our pet projects. We already contributed some stats to their weekly newsletter back in April, and the project was also a matter of study for Analyzing Risks associated to FLOSS Communities, one of our LinuxTag 2012 talks. Now, here we come back to it, with a preview of a wider study we’re preparing. This one is on how companies are contributing to the maintenance and improvement of OpenStack, based on the analysis of its (many) git repositories [see the full preview of the study].
We extracted all information related to commits, and who performed them, from git metadata. Then, we used some heuristics and manual analysis to detect bots, and determine the companies for which committers are working. Based on that information, we have produced separated charts with the activity performed by specific companies.
For each company, we are producing charts showing the number of commits and active committers per month, which may give an idea of how active the company is in the project (you know, commits and number of active committers are just proxies for activity, so your mileage may vary, etc. etc). We also provide information on the number of different repositories (each OpenStack git repository roughly corresponds to a subproject) and files touched per month, which suggest how wide the contributions by the company are (some companies are very concentrated on specific parts of OpenStack, while others are spread all over the project).
Finally, two more metrics related to how committers (for each company) behave: the ratio of commits per developer, and the hourly pattern of contributions. The former give us an idea of the mean individual effort by developers (but not all commits are equal, you know), while the latter is a first shot at the hourly uses of developers for the company. Since git is tracking times according to the developer timezone, there you can see if they are working mostly on office hours, outside office hours, etc.
The charts of each company are absolute (that is, they show total number of commits, or active committers, or touched files, or whatever), and therefore have to be considered in the context of the total activity of the whole project. Therefore, for example, the apparent decrease in activity for many companies since October 2011 has to be put into the context of a lower number of commits for the whole project (see charts for aggregated data in the top of the main page for the study). This said, and keeping an eye on the scale for the Y axis, you can also observe how some companies are clearly decreasing their activity, while some others are taking the token, and have increased it during the last months.
For this study, we have analyzed a total of 33 git repositories, all shown in Github as associated with OpenStack on August 8th, the date of the data collection. We did our best to identify unique developers (e.g., by joining the activity performed by the same developer under different identities), to filter out bots (we were interested in activity by humans) and to assign developers to companies (by looking at the domains in the email addresses, and some other heuristics and even in some cases, manual inspection). But errors for sure remain, of course (in fact, if you suspect of errors because of your knowledge of the project, please let us know, we would love to track those).
We also removed developers for who we couldn’t identify a company, and those related to the domain openstack.org (which seem to be bots). With all of these, we managed to identify developers contributing well over 95% of the total number of commits to the OpenStack git repositories, which makes us feel confident that the data, having for sure some errors, is representative at least for the main actors in the project.
If you have any kind of feedback, and specially which other kind of data you could be interested in, or which data you find useful /unuseful, please, comment…