Today it is the day when Folsom, the new release of OpenStack, is publicly announced. We at Bitergia have been studying it for a while, and today we’re publishing our results as well. So, welcome to the report presenting the charts and numbers showing how OpenStack Folsom was built. For comparison, we’re also presenting a similar report on the previous major release of OpenStack, Essex, released on April, and with a similar release cycle of about six months. Both reports rely on data obtained from the source code management (git) and issue tracking (Launchpad) systems.
[Update, Oct 2 2012, 12:15 GMT. We’ve published some details on the methodology used to produce these reports]
[Update, Sep 28 2012, 16:00 GMT. We have included a new box in the main (summary) page of the study, showing the participation by companies in the 7 projects that OpenStack developers usually considers as “core OpenStack”, and which are those actually subject to the Folsom release cycle.]
The number of commits (changes to source code) is similar in both cases (and more when commits by bots is excluded), which tells about similar coding activity. The number of unique files touched is higher in the case of Essex which suggests that the work is getting more concentrated in certain parts of OpenStack. And the number of companies identified as contributors is quite similar.
However, when we come to a basic analysis of the community of developers (committers), it is easy to appreciate how it is growing, from 47 core developers in Essex to 71 in Folsom, and a similar increment for the whole population. The numbers of the issue tracking system show a similar story.
The list of companies contributing to OpenStack (measured by number of changes to the source code and by number of developers perming those changes) may be also of interest. Rackspace is the first contributor, with about a quarter of all commits, followed closely by RedHat, and at a certain distance by Nebula. Then, with comparable number of commits (but a very diverse number of developers involved) come HP, Isi, Cloudscaling, IBM, Sina, Canonical, Inktank and others.
The situation was more concentrated six months ago. For Essex, Rackspace amounted for most than half the commits. Redhat, Nebula and HP followed at a certain distance, and later came Canonical, Nicira, Citrix, Enovance, Cloudscaling, Isis and others. Looking at these numbers, it is clear that the OpenStack ecosystem of companies is now more leveled, with less dominance by Rackspace, and the clear emergence of other companies which seem to be betting hard to improve the code base.
The rest of this post deals with some methodological details about the report. You can have a look at them, and / or go straight to the full Folsom report, and its sections on general issues (analysis of commits and tickets), analysis per company, and analysis per project. Or you can compare them with those in the Essex report. You can also browse our previous posts, How companies are contributing to OpenStack, and Preview of the analysis of the upcoming OpenStack release which showed some preliminary results, and explained a bit the methodology for the companies analysis (but beware, both were done with only a fraction of the projects in OpenStack, so the data in the final report is much more complete).
The work in OpenStack is organized in several projects, which have a very different size and level of activity. As the chart shows, Nova is clearly the one deserving more attention, with more than 1,500 changes during the preparation of the release. But it is also interesting to notice how the second on is in fact related to documentation, not code (OpenStack Manuals). Libraries supporting the rest of the projects, when taken together, are also experimenting a lot commits. When comparing to Essex, Nova was also very active, but the second one the, Keystone, is now in a modest sixth place. Manuals was much lower than it is now. These shifts in the order by number of commits reflect the changing priorities and needs of OpenStack, which have evolved during the last year.
With respect to the overall size and performance of the whole OpenStack development community, all metrics tell a story of growth over the Folsom release cycle. For tickets, for example, open tickets and ticket openers (usually related to people testing or using the platform, but also to developers who use the ticketing system to schedule activities) grow slightly over the period (see chart on the left). But both closed tickets and developers closing tickets per week show a more clear trend of growth.
The closed tickets chart show the usual peaks (related to more intense periods of bug squashing or ticket cleaning) becoming larger as time passes. The team working to close tickets (ticket closers per week) also grow, with peaks of about 70 developers some weeks.
Graphs for commits (changes to code) and committers (developers performing those changes) also show a growth pattern (see chart below). At the end of the release cycle we’re seeing around 400 commits per week, with more than 70 developers involved every week.
[Note: first and last weeks in the charts usually are low in any parameter, because they are “broken” weeks, with less than 7 days]
With respect to the preview we published some days ago, this report is more comprehensive: it includes most of what OpenStack considers as its projects, including all the core and libraries projects (libraries were excluded in the preview). This has the effect of shaking a bit the list of contributions by company, and some of the aggregated charts. So, even if you had a look at the preview, please go through the final report, it is more complete in the sense of covering a larger part of OpenStack.
[Final note: All the numbers included in this study could still have some errors, but they have already gone through a number of validations, and are correct to our best knowledge. This said, remember you can always download the datasets and do a parallel analysis, if you’re interested.]
Hello, regarding of your report, I have two concerns:
First, now it is controversial that whether should the data source only cover the 7 core projects or take the related projects into consideration as well. I prefer to only cover 7 core projects, since other projects does not follow the release cycle.
The second is that it seems that your mapping of people/email/domains/companies is not as updated as the openstack-gitdm project, I just clone this projects, and calculate it again, found there are many discrepancies between the result of openstack-gitdm and your report, and the top N ranking also has some differences.
Since your data might be widely referred by the masses and media, it’s very important for us to make it accurate and neutral, or it maybe unfair for other companies who has much involvement in OpenStack projects.
Hello Hui Cheng,
First of all, thank you very much for your comments and I am sorry of hearing this. Our main concern as a company is to provide transparent results that can be traced from the beginning to the end.
For this reason our tools are free software, together with the databases, scripts files to query those databases and resultant json files that are populating part of the charts. Those can be found in the report page.
Regarding your comments. You are right, the selection of the 16 projects perhaps is one of the ways to measure Openstack, but not the best. Indeed, if you check one of our previous analysis (http://blog.bitergia.com/2012/09/22/preview-of-the-analysis-of-the-upcoming-openstack-release/) you will see how Sina is for example higher in the ranking. In this case we analyzed the seven core projects plus openstack-manuals and results differ from the ones you can see at this post. This is mainly due to the number of analyzed repositories.
Regarding to the gitdm matches between developers and companies, we also improved our databases with such information. So, that was already taken into account. However it is true that there could appear (and there are) differences between the two tools. Our impression is that probably methodologies are a bit different (for instance taking committers instead of authors from Git may provide different results).
We are more than open to improve the dataset if needed and compare methodologies. Please, feel free to contact us so we could have a productive change of ideas of how to improve the analysis. We totally understand your concerns about the mass media and this is something that we always have in mind when preparing these datasets. We want to be as neutral as possible only providing objective data that can be easily traceable, using open source tools, datasets and even the visualization platform.
Please, do not hesitate to contact us again and thank you once more for your comments.
I do not care much about whether you counted 7 core projects or more, what I care is the accuracy of the statistics.
As widely recognized that openstack-gitdm project has the most updated information about the contributors and the mapping of people/email/domains/companies, for the section “Top changeset contributors by employer”, the changeset and committers are also taken from the commit log rather than the Authors file in each projects, so I think our reference standard is the same.
Please check our original data calculated by openstack-gitdm: https://docs.google.com/document/d/1-2qNTaVzuX4wmAcIn1GSn2ddMX44uA0hxQUU_aUQjKg/edit
Even though it only covers the 7 core projects, some number is higher than which covers 7+ projects in your report. (For example the number of Sina’s changeset during Folsom release differs)
Thanks a lot for your comments. We have added a new box in the main (summary) page of the report, with the analysis of companies for the 7 core projects. We did the analysis on all the projects we could because we wanted to understand what was happening in the whole OpenStack during these six months, but I agree that the 7 core projects are of special interest, and more in the context of the release.
With respect to the discrepancies in data, thanks for publishing the results using openstack-gitdm. I guess now, for the core projects, there are less discrepancies, but still exist. Maybe they are due to the differencies in how cvsanaly and gitdm deal with commits, or maybe to differences in company attribution. We have tried to track down all committers with more than 14 commits, which amount to more than 95% of all the commits. Of course, that list could have some errors, but it is not likely that they are important enough to change aggregated data significantly. In particular, we have double or triple-checked large committers.
Great work! I’d love to work with you guys to add information you can get from the Gerrit API – which could also include information about who is doing code reviews, and about code submitted but not merged (which is still activity) Feel free to ping jeblair or I any time if you’d like any help on looking at that information.
I also personally would love it if you’d include things in the openstack-ci and openstack-dev orgs … but I can understand if that’s not interesting. 🙂
We will certainly contact you, thanks a lot for your offer. And glad you found the work useful. WRT openstack-ci and openstack-dev, that’s something we may consider in the future, indeed.
Mark McLoughlin has also updated the shiny result of contribution stats in OpenStack Folsom: https://github.com/markmc/openstack-gitdm/tree/results/folsom
The content of git-stats.txt is same as my shared Google docs.
That also includes other interesting results.
Anyway, thanks for your excellent working to make us easy to learn how is OpenStack built through these visualized graphs.
Thanks a lot for the pointer. We will try to find out where differences come from. Great that you found the charts worthwhile.
To everyone interested, we wrote another post with the methodology used to retrieve, parse and analyze data from OpenStack repositories.
You can find this at http://blog.bitergia.com/2012/10/02/methodology-used-to-analyze-openstack-repositories/
I truly believe that this is a good opportunity to start a discussion about results and the best way to measure activity from the OpenStack community.
Any comments are more than welcome :).