When looking back nowadays to the work done on diversity, I’ve realized that it has been quite a trip! My first approach to the topic was in an informal meeting with Nithya Ruff, currently at Comcast. She mentioned that the OpenStack Summit in Tokyo reached (as far as I remember!) 13% of women attending the Summit. And this was a great number if compared to previous summits as the percentage kept growing. But she also mentioned that they received a tweet asking about the current number of technical contributions. Then this is where we decided to have a look at that issue: have numbers, and try to produce some of them from a quantitative point of view.
In Bitergia we have a list of happy customers using the Metrics Grimoire toolset (fork them at GitHub!) to produce metrics about their communities. Tracking tech communities is not that simple and this needs of some infrastructure. And one of the main issues usually consists of aggregating all of the information.
- How to have aggregated information for a given project from several data sources?
- How to aggregate information from a specific developer from several data sources?
- How to aggregate information for a given company from several data sources?
- How to manage the several identities (IRC nickname, Jira user name, …) across data sources of a developer?
- And what about managing the several affiliations of a developer?
And even more, is there a place where I can easily have a glimpse and check how my community is going?
The following is an example of the OPNFV community where the Git repositories, Gerrit projects, tickets from Jira, mailing lists, IRC channels and the Askbot instance is summarized in the entry page of the OPNFV dashboard.
The Bitergia toolset covers all of these issues with the retrieval of raw information, cleaning and massaging of the data and visualization. Indeed any of these steps are fully independent, what helps you to add any of your favourite tools in any of the several steps.
Let’s imagine that you’re interested in using your favourite visualization tool to play with the data. You can have direct access to the databases or to the post-processed data. It’s your data and Bitergia worries about providing a trustable service where all of the tools and data are open source.
Where do the developers in my FOSS community live? For large open source communities where personal contact with developers is impossible, answering this simple question may be difficult. Fortunately, a simple technique, time zone analysis, can be used on git and mailing list repositories to at least partially answer this question. Read our blog post “Using Git and mailing lists time zones to find out where developers live” in OpenSource.com to learn more about it.
As a part of our tests with Kibana and Elasticserch as frontends for our MetricsGrimoire databases, we’ve set up a dashboard for understanding the code review process in OpenStack (be sure of visiting it with a large screen and a reasonable CPU, otherwise your experience may be a bit frustrating).
This dashboard includes information about all review processes (changesets) in OpenStack, using information obtained from their Gerrit instance. For each review, we have information such as the submitter (owner), the time it was first uploaded and accepted or abandoned, the number of patchsets (iterations) needed until it was accepted, and the time until it was merged or abandoned. With all of them we have prepared an active visualization that allows both to understand the big picture and to drill down looking for the details. Follow on reading to learn about some of these details.
[Note: this is our second post about our dashboards based on Kibana. If you’re interested, have a look at the first one, about OpenStack code contributions.]
[Updated results based on methodological changes]
Kilo, the new OpenStack release, shows a continuous increase of activity if compared to Juno. From Icehouse to Juno, there was an increase of 6.22% in the number of commits and 17,07% in the number of unique authors. From Juno to Kilo, there’s a higher jump in terms of commits (11,23%) and a lower increase in terms of authors (11,16%). However, with this increase, there is a new peak in the number of unique authors contributing to the OpenStack Foundation projects with close to 1,600 different people participating in its development.
After the continuous increase of activity from release to release that we observed in the past, Kilo, the latest release of OpenStack is showing some stabilization. The differences between Juno (the previous release) and Kilo are the lowest in the history of the analysis we’ve performed for the OpenStack Foundation. Although this release has reached a new peak in contributors, close to 1,500 different persons, the increase from Juno to Kilo was of around 900 commits and 200 authors while from Icehouse to Juno it was of 700 commits and 70 developers.
The list of organizations participating in the development of OpenStack keeps growing as well: close to 170 different organizations have contributed with at least one commit to the development of Kilo.
As the top ten contributors, we find the following organizations:
Regarding to the community itself, the timezones analysis shows a widespread activity around the world. OpenStack is a truly 24 hours-a-day continuous development community. There are three main groups of activity: America, on the left side of the chart, Europa/Africa in the center and Asia, on the right.
Ignoring the UTC 0 activity, that may be biased by developers using UTC 0 as their timezone with independence of their point of residence, the rest of the activity shows North America East and West coasts as the main contributors in number of commits. Europe/Africa is quite close to this activity (most of it due to Europe), although biased by the UTC peak of activity. India could be represented by the the small peak in UTC+5, and finally the rest of Asia, with China and Japan in first place, which is consistent with the localization of some contributing companies.
- Some of the repositories under the OpenStack project have been removed of the analysis. As an example, specification projects are not counted for this analysis. The full list of repositories is available at the last quarterly report sponsorized by the OpenStack Foundation.
- Developers are counted as the actual authors of the piece of code merged into upstream.
- The time of commit takes into account the time when that piece of code is merged into upstream.
- Each release, new repositories are added to the list of analyzed projects. This partially explains the continuous increasing activity in the OpenStack Foundation projects.
Having a dashboard usually opens new paths to understand software development communities. This may be seen as the entry point that helps to understand the basics of a community. And on top of this, there may appear new questions related to those basics or to more advanced issues. This is the case of the new work we are working on with the Wikimedia community metrics analytics team: Core Reviewer and Participants.
- Core reviewers are defined as those developers that can exercise a +2/-2 review in Gerrit. In addition to this, it is of interest for the community to remove auto merges. Although this is an undesired behaviour, that takes place, and those should be removed.
- On the other hand, Participants in Gerrit are defined as any member leaving any type of trace in the system. In this set we can find reviews (-2,-1,+1,+2), uploads, comments and others.
It is interesting to notice that depending on the community, requirements are slightly different. In the case of the OpenStack community, there are extra requirements for the Core Reviewer definition. And this is that reviews should be found in master branch. This specific measure can be found in the OpenStack quarterly reports for each of the projects of the Foundation.
[This post is based on the Executive Summary and other sections of the full report about OpenStack and the Icehouse release (part of OpenStack reports) and data retrieved from the OpenStack Activity Board, both developed by Bitergia]
At the moment of this analysis OpenStack projects are close to reach the 74,000 commits since their start as observed in the Activity Board. That activity was developed by more than 2,000 different contributors that at some point started 68,000 code reviews processes and sent and reviewed close to 270,000 different patches. There are more than 33,600 reports in the ticketing system, that were opened by 3,303 different participants. And high activity is also registered in the discussions forums, with close to 52,000 emails messages posted by 2,800 participants and more than 6,200 questions in the OpenStack question and answer tool.
Focus on the development activity, developers can be divided into 246 core developers, 461 with regular activity and 1,214 occasional ones that at some point submitted some patches and contributed to the code.