Happy Birthday Wikipedia!

The next January 15th Wikipedia turns 15 years. In Bitergia, we don’t want to miss the opportunity to say Happy Birthday and congratulate Wikimedia Foundation for “encouraging the growth, development and distribution of free, multilingual, educational content, and to providing the full content of these wiki-based projects to the public free of charge“.

Wikipedia is one Wikimedia’s projects. In fact, it is the oldest, and largest, Wikimedia project, predating the Wikimedia Foundation itself. Wikipedia is often described as a wiki, but it is in fact a collection of over 200 wikis, one for each language, all running on the MediaWiki software. MediaWiki is a free software open source wiki engine, originally developed for and used by Wikipedia, that now is used on other projects of the non-profit Wikimedia Foundation. MediaWiki is freely available for others to use (and improve), and it is used by severals organizations around the world.

After 15 years, many contributors have participated in the MediaWiki project. In Bitergia, we collaborate with the Wikimedia Foundation by analyzing and providing up-to-date development community metrics. The project is characterized by being very inclusive accepting code. Let’s see some of the numbers behind the development of the Foundation’s project in these years (all data available at Wikimedia Foundation Dashboard so feel free to play with the dash and find out data you are interested in):

Wikimedia Foundation Development Community report (Jan. 2016)

Wikimedia Foundation Development Community report (Jan. 2016)

These are some of the big numbers, however there are much more interesting data in the project. Some examples: Who is contributing code, what regions have more weight in the development or how is the evolution of the merged commits? Take a look in the Wikimedia project evolution:

Continue reading

Kilo: the new OpenStack release

[Updated results based on methodological changes]

Kilo, the new OpenStack release, shows a continuous increase of activity if compared to Juno. From Icehouse to Juno, there was an increase of 6.22% in the number of commits and 17,07% in the number of unique authors. From Juno to Kilo, there’s a higher jump in terms of commits (11,23%) and a lower increase in terms of authors (11,16%). However, with this increase, there is a new peak in the number of unique authors contributing to the OpenStack Foundation projects with close to 1,600 different people participating in its development.

After the continuous increase of activity from release to release that we observed in the past, Kilo, the latest release of OpenStack is showing some stabilization. The differences  between Juno (the previous release) and Kilo are the lowest in the history of the analysis we’ve performed for the OpenStack Foundation. Although this release has reached a new peak in contributors, close to 1,500 different persons, the increase from Juno to Kilo was of around 900 commits and 200 authors while from Icehouse to Juno it was of  700 commits and 70 developers.

The list of organizations participating in the development of OpenStack keeps growing as well: close to 170 different organizations have contributed with at least one commit to the development of Kilo.

As the top ten contributors, we find the following organizations:


Regarding to the community itself, the timezones analysis shows a widespread activity around the world. OpenStack is a truly 24 hours-a-day continuous development community. There are three main groups of activity: America, on the left side of the chart, Europa/Africa in the center and Asia, on the right.

Total commits by timezone as detected in Git repositories

Total commits by timezone as detected in Git repositories

Ignoring the UTC 0 activity, that may be biased by developers using UTC 0 as their timezone with independence of their point of residence, the rest of the activity shows North America East and West coasts as the main contributors in number of commits. Europe/Africa is quite close to this activity (most of it due to Europe), although biased by the UTC peak of activity. India could be represented by the the small peak in UTC+5, and finally the rest of Asia, with China and Japan in first place, which is consistent with the localization of some contributing companies.

Methodological notes:

  • Some of the repositories under the OpenStack project have been removed of the analysis. As an example, specification projects are not counted for this analysis. The full list of repositories is available at the last quarterly report sponsorized by the OpenStack Foundation.
  • Developers are counted as the actual authors of the piece of code merged into upstream.
  • The time of commit takes into account the time when that piece of code is merged into upstream.
  • Each release, new repositories are added to the list of analyzed projects. This partially explains the continuous increasing activity in the OpenStack Foundation projects.

Free / Libre Open Source Software Community Metrics meeting recap

After the Community Leadership Summit, our next big event in Portland has been the FLOSS Community Metrics meeting, organized by us together with Puppet Labs, that hosted the meeting in their offices. Special thanks to Dawn Foster and Kara Sowles for all their help and support.

The room was crowded, with people from organizations like Eclipse Foundation, Red Hat, Google, Twitter, PayPal, Open Source Initiative, LibreOffice, Kaltura, Cloudera, etc. There has been a lot of interesting topics and talks, and almost everything is already available in the 2014’s edition website

Let’s try to brief how it was…

James Falker, from Liferay, talking about bullshit metrics like downloads, etc.

Continue reading

Measuring demographics: OpenStack as case study

Turnover is inevitable. Developers leave a project and others join it. And this effect may be more harmful in open source communities than in companies. Depending on the community, it is hard to find new people willing to participate. And even more, there is a knowledge gap left by those that gave up developing. So the issue is double: people leave and those leave a knowledge gap that in some cases is hard to fill.

However, is it possible to analyze that regeneration of developers? How good is my community retaining developers? Is it possible to measure the number of newcomers joining the community? It is clear that having this type of information is basic to define policies to attract new members, retain current ones and check if the current situation is driving the community to good terms.

This post is an example of the type of things that in Bitergia we are building on top of the CVSAnalY tool. In previous posts we introduced the concept of commit, its peculiarities as a metric, and several ways to calculate this, adding filters such as bots, merges or branches.

The demographics of open source communities allows us to understand how the community has evolved, and potentially how this community will evolve through the time. Demographics in open source communities can be seen as the typical analysis of pyramids of population in countries or cities. Typically on the top of the chart the oldest people are found, while the age decreases going to the bottom of the chart. Those are named as pyramids given their typical triangle shape. However during the last decades and in developed countries, this shape is moving to an inverted pyramid, although this is another discussion :).

Thanks to the study of the demographics of developers, it is possible to know a bit more about the community. We already introduced the demographics of the Linux Kernel, and this post is focus on the analysis of the OpenStack community as a case study. The following figure shows the demographics of the OpenStack community (daily updated in the OpenStack activity dashboard). The x-axis indicates the number of developers, while the y-axis  shows the timeframe of activity.

Demographics of the OpenStack developers community

Demographics of the OpenStack developers community

Green bars show the number of developers that in each of the periods started contributing with at least one commit. And blue bars show the number of those developers that still contribute to the community. By definition, a developer is still contributing to the community if a commit has been detected during the last six months. If not, this developer is considered as a developer that left the community. There may raise the case when a developer after more than six months, returns and submit another change to the source code. In this specific context, this developer would appear as not leaving the community.

Continue reading

How to measure commits: merges, branches, repositories and bots

In a previous post (Commits: that metric), we were talking about all of the flavors we should take into account when measuring commits.

An example was provided and in some cases, and depending on the development policy of the project, commits ignoring merges represented around a 50% of the total activity that we can find.

CVSAnalY is one of the tools that is used as input in our dashboards. It is specialized in versioning systems, and parses the log provided by some of the most used in the open source world. It does this with the priceless help of Repository Handler, in charge of adding a transparency layer.

Its procedure is simple: CVSAnalY reads a log from SVN, CVS or Git and builds and feeds a relational database. For other distributed versioning systems, there are hooks to migrate from those, such as Mercurial or Bazaar to Git.

In order to illustrate this post, the publicly available database for the OpenStack project is used. This database is the basement of the dashboard that can be visualized at the Openstack Activity Dashboard page. Bitergia provides and daily updates this database. So, this analysis is done with dataset up to today.

Continue reading

Commits: that metric

Source code versioning systems are tools that help to facilitate the life of developers. Basically those are used to have a list of all of the changes in the source code and allow to navigate and recover old version of the project. Each of those changes to the source code is defined as a commit, and this may be considered as the nuclear piece of information in these systems.

And commits are nowadays considered as a “good” metric to have an initial idea of the total effort developed in a project. However, this is not as simple as it seems to be, and each versioning system and even each project with its particularities may distort this metric. So we all need to be a bit careful when raising this metric as “the most wonderful, marvelous and incredible metric in the world”.

So, in first place, what kind of information can we find in a commit? Typically commits provide information about the time when the change took place, files that were affected by that change,  added, removed or modified lines, the author of the commit, and maybe extra information such as the reviewer, specific acknowledgements and others. The following example shows information that can be found in a specific commit (using the git log command):

commit 160ae59a76e2ce3fb6589137d90bb9e80f056fa0
Author: Daniel Izquierdo <dizquierdo@bitergia.com>
Date:   Fri Mar 7 13:32:25 2014 +0100

Add turnover in ITS and SCR

diff –git a/vizGrimoireJS/alerts.py b/vizGrimoireJS/alerts.py
index ff5a703..12b1de6 100755
— a/vizGrimoireJS/alerts.py
+++ b/vizGrimoireJS/alerts.py
@@ -82,15 +82,29 @@ if __name__ == ‘__main__’:


However, the definition of commit is really specific of the versioning system. Just an example, a commit in CVS is a modification in one file. So N modified files, implies, N commits. But, on the other hand, Subversion or Git may have several “touched” files in the same commit. Are comparable projects at the level of commits using different versioning system? The answer is probably that they are not comparable simply counting commits. You need a bit more advanced way to count them.

Continue reading

The OpenStack Havana release

Havana release is scheduled on the 17th of October. In just a few hours the new version of OpenStack will be ready. As we did for other releases, we at Bitergia have prepared the Havana development dashboard for showing and exploring the main development parameters of the project during this cycle. The first headline that becomes apparent by browsing it is that during these last six months, the OpenStack community has experienced the most active period in their history, and still keeps growing and growing.


Figure 1: Organizations participating in the development of the OpenStack Havana release

Havana may well be titled as the 900-developers-release. In approximately six months of work, between the 4th of April till the 17th of October, this community has been able to receive contributions from 900 different people, affiliated to more than 150 organizations. And we are only talking about the source code activity.

In fact, aggregated numbers are impressive:

  • Commits: 13,624
  • Developers: 923
  • Messages sent: 14,426
  • Opened tickets: 9,455
  • Source code reviews: 21,228

However, the comparison with previous releases is still more interesting.

Continue reading