The adoption of Python notebooks to perform data analysis has considerably increased, becoming a de-facto standard within data scientists communities. But, which Python libraries are used on them?
We at Bitergia are busy testing new stacks for analyzing and visualizing the software development data we collect. Some our latest tests involve using Kibana for visualization. In this case, we have prepared a dashboard showing the latest contribution data for OpenStack.
One of the nice things that these new dashboards allow is the level of filtering and drill down which is possible. For example, in the above dashboard, it is possible to click on any sector on a pie chart, on any entry of a table, on any bar in a bar chart, and the corresponding filter will act. This allows for obtaining specialized dashboards very easily, such as this one with the contributions by RedHat (produced by clicking on RedHat in the list of of top organizations, or the contributions to Liberty, the latest release cycle of OpenStack, by selecting the corresponding period (last bar) in the “OpenStack ten top organizations by release” chart.
If you’re interested in learning about some tips and tricks about what can be done with these dashboards, follow on reading…
[This post is part of the lightning talk presented at FOSDEM 2015. The talk was titled as “Data, data and data about your favourite community” whose slides are available in the Bitergia’s Speakerdeck place. The ipython notebook used for visualization purposes is accesible through nbviewer and can be downloaded in GitHub. This is a basic introduction to GrimoireLib.]
GrimoireLib aims at providing a transparency layer between the database and the user. This helps to avoid the direct access to the databases while providing a list of available metrics.
This is a Python-based library and expects an already generated database coming from some of the Metrics Grimoire tools. CVSAnalY, MailingListStats, Bicho and most of the tools are already supported by this library.
In a previous post (Commits: that metric), we were talking about all of the flavors we should take into account when measuring commits.
An example was provided and in some cases, and depending on the development policy of the project, commits ignoring merges represented around a 50% of the total activity that we can find.
CVSAnalY is one of the tools that is used as input in our dashboards. It is specialized in versioning systems, and parses the log provided by some of the most used in the open source world. It does this with the priceless help of Repository Handler, in charge of adding a transparency layer.
Its procedure is simple: CVSAnalY reads a log from SVN, CVS or Git and builds and feeds a relational database. For other distributed versioning systems, there are hooks to migrate from those, such as Mercurial or Bazaar to Git.
In order to illustrate this post, the publicly available database for the OpenStack project is used. This database is the basement of the dashboard that can be visualized at the Openstack Activity Dashboard page. Bitergia provides and daily updates this database. So, this analysis is done with dataset up to today.
The MetricsGrimoire toolset, a key component for the software development analytics services provided by Bitergia, has been improved in the context of the ALERT project. The improvements have been contributed back, and incorporated in the MetricsGrimoire code base. Bitergia is using these tools for analyzing software development repositories, and has decided to provide commercial services specifically targeted at supporting and customizing the tools for interested customers.
ALERT is a R&D project funded by the European Commission under the 7th Framework Programme, aimed to improve the development process in open source collaborative environments. The ALERT system provides methods and tools to improve the coordination among collaborative as well as distributed virtual teams developing software in open source communities, and in software development companies. The project, which has already delivered its final results, has been evaluated by its reviewers as “excellent”.
Gerrit is becoming more and more popular in open source communities, being an essential part of the Wikimedia or OpenStack foundations among others.
We, at Bitergia, have started the process to include the information provided by Gerrit API in our toolset to finally obtain dashboards with specific information from the review process. So project managers can deal with big amount of data from this repository in an easier way, having aggregated numbers, but also with the possibility to deepen in the numbers and details if required.
There are five functions so far:
- EvolReviews: evolution of reviews per type (merged, new, workinprogress, abandoned,…) and per period of time (month, year, week, …)
- EvolReviewers: evolution of the reviewers per period of time
- EvolEvaluations: type of evaluation per patchset (verified, submitted, …) and period of time
- Waiting4Review: number of patches waiting for a reviewer response (those that got a positive review)
- Waiting4Submitter: number of patches waiting for a submitter response (typically those that had a negative review)
As usual, feedback is welcome!
We’re improving the dashboards created with vizGrimoire, by using Bootstrap to offer a better and more complete look and feel. The data remains the same, the charts are the same, but the overall aspect has changed. Using some data from OpenStack, we have prepared a preview of the new dashboard (still work in progress)
On February 3rd, I was delivering a lightning talk at FOSDEM, presenting MetricsGrimoire and vizGrimoire as free software tools to get some analytics from the software repositories of your preferred project. The talk was titled “Do you want to measure your project?”, as it was focused on explaining the capabilities of these tools for analyzing a project, and on how they can be easily used for that.
[Update (2013.03.01): New post in the series: Reviewers and companies in the WebKit project]
Today Bitergia presents the first of a series on analytics for the WebKit project. After the preview we published some weeks ago, we finally have more detailed and accurate numbers about the evolution of the project. In this case, we’re presenting a report on the activity of the companies contributing to WebKit based on the analysis of reviewed commits.
Some interesting results are the share of contributions by the two main companies behind the project (Apple and Google), and how it has evolved from a project clearly driven by Apple, before 2009, to the current situation, with Google leading the top contributors table, and both Apple and Google being almost equal in contribution share over the whole history of the project. During the last years, it is also noteworthy how the diversity of the project is increasing, with new players starting to show a significant activity.