Plotting of the development activity of a software project

These days, we’re very busy preparing some new technologies to show how software projects are performing. One of them will provide a combination of automatic extraction of data from software development repositories, semi-automatic filtering and analysis of that data, and visualizations in ways that make it clear the progress and performance of the project as a whole.

Preview of a part of the dashboard for a project
Preview of a part of the dashboard for a project

Above you can see the idea, which you can also explore in more detail in the preview page we have set up with real data from the GNOME Shell project (we already talked about it some days ago). In short, it is just a bunch of interactive plots, which in this case present the evolution of some interesting parameters related to the source code management system (git in this case) and to the issue tracking (aka bug reporting) system (Bugzilla).

Comparison of commits, committers and files touched
Comparison of commits, committers and files touched per month. Both similarities and differences are easy to spot.

In the case of the git repository, the evolution of commits, committers, touched files, touched branches, touched repositories (when there are more than one) give us different views of what is happening in the project. Commits and committers of course are good proxies for the overall activity of the project: the first one is related to how fast the project is experimenting changes, while the number of committers allow us to glimpse how large is the community of people actually responsible for committing those changes. But the number of files touched tell us also a complementary story, since it is more related to how wide changes are. For example, the largest peak in this plot is in Nov 2009, when there are also peaks, but much smaller, for the other parameters tracked. Since the number of files touched are usually related to how `widespread` changes in a period are (many files touched mean that a large part of the project is experimenting change, even if it small for each file), this shows a different kind of activity. The number of branches and repositories touched is usually related to the way the project is organized, and how parallel lines of development are used during certain periods.

In the case of the Bugzilla repository, the evolution of how tickets are opened and closed, and the number of people involved in both activities tend to represent different views of the ‘reporters’ (for ticket openings) and ‘fixers’ (for ticket closings) populations that help to maintain the quality assurance activities of the project (although of course the issue tracking system could be used for much more than QA). The number of changes (and the population of changers) tend to be a good proxy for all the people involved in the daily coding-testing-fixing process.

In both cases, a quick look at the plots show the general pattern of activity of the project. But by looking at the details and the differences between them, much more can be learned about what is happening in it. After all, free / open source projects that decide to be as transparent as to provide public access to their code and tickets repositories are allowing us to know about their internal history. These plots just try to summarize a part of it.

[This is just a technology preview, but we are very interested in detecting errors and in learning how you find it useful. Any comments and feedback are more than appreciated]

Leave a Reply

Up ↑

%d bloggers like this: