How to measure commits: merges, branches, repositories and bots

In a previous post (Commits: that metric), we were talking about all of the flavors we should take into account when measuring commits.

An example was provided and in some cases, and depending on the development policy of the project, commits ignoring merges represented around a 50% of the total activity that we can find.

CVSAnalY is one of the tools that is used as input in our dashboards. It is specialized in versioning systems, and parses the log provided by some of the most used in the open source world. It does this with the priceless help of Repository Handler, in charge of adding a transparency layer.

Its procedure is simple: CVSAnalY reads a log from SVN, CVS or Git and builds and feeds a relational database. For other distributed versioning systems, there are hooks to migrate from those, such as Mercurial or Bazaar to Git.

In order to illustrate this post, the publicly available database for the OpenStack project is used. This database is the basement of the dashboard that can be visualized at the Openstack Activity Dashboard page. Bitergia provides and daily updates this database. So, this analysis is done with dataset up to today.

Continue reading “How to measure commits: merges, branches, repositories and bots”

Commits: that metric

Source code versioning systems are tools that help to facilitate the life of developers. Basically those are used to have a list of all of the changes in the source code and allow to navigate and recover old version of the project. Each of those changes to the source code is defined as a commit, and this may be considered as the nuclear piece of information in these systems.

And commits are nowadays considered as a “good” metric to have an initial idea of the total effort developed in a project. However, this is not as simple as it seems to be, and each versioning system and even each project with its particularities may distort this metric. So we all need to be a bit careful when raising this metric as “the most wonderful, marvelous and incredible metric in the world”.

So, in first place, what kind of information can we find in a commit? Typically commits provide information about the time when the change took place, files that were affected by that change,  added, removed or modified lines, the author of the commit, and maybe extra information such as the reviewer, specific acknowledgements and others. The following example shows information that can be found in a specific commit (using the git log command):

commit 160ae59a76e2ce3fb6589137d90bb9e80f056fa0
Author: Daniel Izquierdo <dizquierdo@bitergia.com>
Date:   Fri Mar 7 13:32:25 2014 +0100

Add turnover in ITS and SCR

diff –git a/vizGrimoireJS/alerts.py b/vizGrimoireJS/alerts.py
index ff5a703..12b1de6 100755
— a/vizGrimoireJS/alerts.py
+++ b/vizGrimoireJS/alerts.py
@@ -82,15 +82,29 @@ if __name__ == ‘__main__’:

[…]

However, the definition of commit is really specific of the versioning system. Just an example, a commit in CVS is a modification in one file. So N modified files, implies, N commits. But, on the other hand, Subversion or Git may have several “touched” files in the same commit. Are comparable projects at the level of commits using different versioning system? The answer is probably that they are not comparable simply counting commits. You need a bit more advanced way to count them.

Continue reading “Commits: that metric”

Blog at WordPress.com.

Up ↑