Companies contributing to OpenStack (Grizzly Analysis)

As an update of our analyses on previous releases of OpenStack, today we present our analysis of company contributions to Grizzly, the new release of the project being published today. OpenStack is a well known free / open source software project providing facilities for building private and public clouds. It is also a good example of a development community in which almost all participants are affiliated to some company. Therefore, it is interesting to study how companies contribute to the project by means of their employees. This is exactly what we did in our analysis.

Commits by company, all OpenStack projects, Grizzly release cycle
Commits by company, all OpenStack projects, Grizzly release cycle

The results are revealing, and more when you put them into context by looking at how companies were behaving in previous release cycles. The data is telling a story of how a project which was clearly dominated by a single company (RackSpace) has led to a real community with many actors. With more than 20 different companies contributing every week and a total of more than 50 companies which contributed at some point, OpenStack is becoming one of the projects with more corporate involvement. The number of people involved is also large, and growing. In September / October, when the release cycle was starting, about 40-50 persons contributed each week. At the end of the cycle, in March, they were about 80-90.

Activde developers by company, in OpenStack git, all projects, Grizzly release cycle
Activde developers by company, in OpenStack git, all projects, Grizzly release cycle

The top corporate contributors are no surprise: Red Hat, Rackspace, IBM, Nebula, HP and others are well known for their support of the project. Maybe it is interesting to see how Rackspace is slowly handing over the leadership of the project to other companies, and how quickly some of them are getting involved.

[Now, go on reading this post, or have a look at the complete analytics dashboard for Grizzly]

[Note (Apr 6th 00:10 CEST): Thanks to those of you who are reporting what seem to be errors or inaccuracies. We’re having those reports into account and will produce a new run of the data soon. However, up to now they don’t seem to change things significantly except for specific developers or maybe some company going one position up or down in some list. Please, keep those reports coming.]

The data shows how no single company is today controlling the project. The main contributor per number of commits (Red Hat) is responsible for much less than 20% of all contributions, and other metrics are providing similar results. The community is showing the usual combination of some more involved companies, and a long tail of more casual contributors. The summary charts for all companies tell the different stories of their involvement: many are growing in contributions, while some are stable and a minority are having a decreasing participation (after factoring the usual ups and downs).

Activity of Rackspace during the Grizzly release cycle
Activity of Rackspace during the Grizzly release cycle

When looking at the larger contributors by commits, those stories are apparent. Red Hat has become the first contributor, with about 15-20 persons involved in development. IBM is the rising third (or second, depending on the metric), with a clear increase during the period, up to maintaining a team of 15-20 during the last weeks. Rackspace is the other main contributor, stable during the period with a team of about 10-15 active developers.

Commit activity of Rackspace during the Grizzly release cycle
Commit activity of Rackspace during the Grizzly release cycle

The different charts show how companies are not only involved in shaping the source code, but are also contributing by working to close bugs, and are participating in the mailing lists. All in all, the history of these six months of OpenStack development is one of growth, increased involvement by most of the companies participating in it, and increasing participation by number of companies.

Tickets closed by company, all OpenStack projects, Grizzly release cycle
Tickets closed by company, all OpenStack projects, Grizzly release cycle

The raw data about several metrics of activity in the git repository for all identified companies can be downloaded as an spreadsheet [ODF, Excel or rendered as PDF formats]

Methodology notes:

The analysis has been performed by retrieving data from the development repositories of the project: git for changes to the source code (official repositories are hosted in GitHub), Launchpad for tickets, including bug reports, and Mailman for mailing lists. For the retrieval, MetricsGrimoire was used, and vizGrimoire is the basis of the analysis and visualization. We have identified affiliations for developers based on data provided by the OpenStack Foundation and on our own research. For source code changes (git), we have considered authors of commits (as oppossed to committers of commits, which for OpenStack are in many cases bots such as Gerrit). For tickets, we have considered those closed by authors of commits (using their identifiers in git and Lanuchpad to do the matching) and for email messages we have considered messages sent by authors of commits as well (in this case matched using their address).

No single metric can show all aspects of the participation of a company in the project. In particular, number of commits or number of tickets closed are just proxies for activity and contributions, and cannot be considered as exact measures of any of them: commits and tickets may be very different from each other, and have a very different value for the project. However, the analytics used in the report is usually considered as a good approximation to the overall activity.

31 thoughts on “Companies contributing to OpenStack (Grizzly Analysis)

Add yours

    1. Hello Joseph,

      First, thanks a lot for your comments!. Indeed data was correct, at least for Vish Ishaya and Soren, but the issue was found in the queries when talking to the database and only in those parsing information from the mailing lists.

      So, this is why you saw both developers with repeated values in all of the companies that they were part of at some point (as probably happening with other developers in the case of mailing lists analysis).

      Now, checking at the charts and tables, this is fixed.

      http://bitergia.com/public/reports/openstack/2013_04_grizzly/company.html?company=Nebula
      http://bitergia.com/public/reports/openstack/2013_04_grizzly/company.html?company=Rackspace
      http://bitergia.com/public/reports/openstack/2013_04_grizzly/company.html?company=Cisco%20Systems

      Just to let you know a bit more about the methodology, we already made a couple of analysis on OpenStack, so this is a mix of data from the OpenStack Gitdm project in GitHub plus our own dataset plus polished data from the community and feedback in the blog.

      Thanks for the comments!. Please, let us know any other concerns you may have, it’s great to have feedback.

      Regards,
      Daniel.

    2. BTW, in case you’re interested in more in depth analysis of the matching between developers and affiliations, I recommend you to have a look at the databases that can be found at:

      http://bitergia.com/public/reports/openstack/2013_04_grizzly/data/db/

      Our purpose is to be as transparent and neutral as possible. This is one of the reasons to release all of our software (retrieval, filtering and visualization) as free software.

      So any doubt about the methodology, incoherent results or features you would like to have, let us know!

      Regards

  1. The number of authors per company seems inconsistent. I’ve seen variances between what is in your spreadsheet and what is in the JSON files.

    Also, the commits-per-author statistic is interesting, but it should not be a mean, it should be a median.

    1. Hi Eric,

      We’re working on it.

      The issue is related to the fact that we were not counting unique identities in a correct way in a couple of queries.

      Give us some minutes and this should be fixed.

      Thanks for your comment, and let us know any other incoherent data you may find!

      Regards

      1. This finally took longer than expected, so just to keep current version, we decided to wait till Monday to update results and to be sure that the new dataset is coherent.

        Thanks again Eric!

    2. Finally charts and tables are updated. Issues came from queries in some of the cases where we were counting identities and not real people (in some cases you may find more than one identity for a given developer).

      So, this was happening in the case of the static info per company in each companies’s page, but not in the tables in the main page.

      Thanks for your patience!

      Regards

  2. Hehe.. great article guys.. love to see the momentum, but I’m a little surprised that in every one of your graphs, the graphing library you’re using does not remove the last (zero) data point. According to your metrics, as of March, there are zero commits, zero developers, and all of your spikes and surges are all trending to zero.

    Great article, but the zero trends are a bit distracting and cheat your trending.

    Just sayin..

    Tweeks

    1. Hi Tweeks,

      Thanks a lot for the comment, as you said this is an issue. In any case, please notice that after the removal of such zeros, there will be again a decreasing trend. This is due to the fact that the last week is not complete. The methodology takes sets of seven days, and the last set contains less than seven days.

      We’re working now on a more general approach where the whole history of OpenStack could be visualized together with specific releases, so please keep in touch!

      Regards

  3. Those there are some minor issues need to be fixed, I would focus on the great value this work generated. I’m wondering if the stats in this work can be automated and generated periodically and posted here or somewhere known!

    Thanks for the great work!

Leave a Reply

Up ↑

Discover more from The Software Development Analytics Blog

Subscribe now to keep reading and get access to the full archive.

Continue reading