Commits: that metric

Source code versioning systems are tools that help to facilitate the life of developers. Basically those are used to have a list of all of the changes in the source code and allow to navigate and recover old version of the project. Each of those changes to the source code is defined as a commit, and this may be considered as the nuclear piece of information in these systems.

And commits are nowadays considered as a “good” metric to have an initial idea of the total effort developed in a project. However, this is not as simple as it seems to be, and each versioning system and even each project with its particularities may distort this metric. So we all need to be a bit careful when raising this metric as “the most wonderful, marvelous and incredible metric in the world”.

So, in first place, what kind of information can we find in a commit? Typically commits provide information about the time when the change took place, files that were affected by that change,  added, removed or modified lines, the author of the commit, and maybe extra information such as the reviewer, specific acknowledgements and others. The following example shows information that can be found in a specific commit (using the git log command):

commit 160ae59a76e2ce3fb6589137d90bb9e80f056fa0
Author: Daniel Izquierdo <dizquierdo@bitergia.com>
Date:   Fri Mar 7 13:32:25 2014 +0100

Add turnover in ITS and SCR

diff –git a/vizGrimoireJS/alerts.py b/vizGrimoireJS/alerts.py
index ff5a703..12b1de6 100755
— a/vizGrimoireJS/alerts.py
+++ b/vizGrimoireJS/alerts.py
@@ -82,15 +82,29 @@ if __name__ == ‘__main__’:

[…]

However, the definition of commit is really specific of the versioning system. Just an example, a commit in CVS is a modification in one file. So N modified files, implies, N commits. But, on the other hand, Subversion or Git may have several “touched” files in the same commit. Are comparable projects at the level of commits using different versioning system? The answer is probably that they are not comparable simply counting commits. You need a bit more advanced way to count them.

Continue reading “Commits: that metric”

First steps mining Gerrit

Gerrit is becoming more and more popular in open source communities, being an essential part of the Wikimedia or OpenStack foundations among others.

We, at Bitergia, have started the process to include the information provided by Gerrit API in our toolset to finally obtain dashboards with specific information from the review process. So project managers can deal with big amount of data from this repository in an easier way, having aggregated numbers, but also with the possibility to deepen in the numbers and details if required.

With this in mind, a new backend has been included in Bicho [1] (still work in progress) and a new library to analyze the resultant database can be found in VizGrimoireR [2] (also work in progress).

There are five functions so far:

  • EvolReviews: evolution of reviews per type (merged, new, workinprogress, abandoned,…) and per period of time (month, year, week, …)
  • EvolReviewers: evolution of the reviewers per period of time
  • EvolEvaluations: type of evaluation per patchset (verified, submitted, …)  and period of time
  • Waiting4Review: number of patches waiting for a reviewer response (those that got a positive review)
  • Waiting4Submitter: number of patches waiting for a submitter response (typically those that had a negative review)

As usual, feedback is welcome!

Cheers

[1] https://github.com/MetricsGrimoire/Bicho/blob/master/Bicho/backends/gerrit.py

[2] https://github.com/VizGrimoire/VizGrimoireR/blob/newperiod/vizgrimoire/R/SCR.R

Blog at WordPress.com.

Up ↑