[This post is part of the lightning talk presented at FOSDEM 2015. The talk was titled as “Data, data and data about your favourite community” whose slides are available in the Bitergia’s Speakerdeck place. The ipython notebook used for visualization purposes is accesible through nbviewer and can be downloaded in GitHub. This is a basic introduction to GrimoireLib.]
GrimoireLib aims at providing a transparency layer between the database and the user. This helps to avoid the direct access to the databases while providing a list of available metrics.
This is a Python-based library and expects an already generated database coming from some of the Metrics Grimoire tools. CVSAnalY, MailingListStats, Bicho and most of the tools are already supported by this library.
The following piece of code imports the needed modules to start playing with GrimoireLib. Each of the metrics or studies are always instantiated in the same way: a database connection object and a set of predefined filters. In this example, SCMQuery is the module to access the database, MetricFilters module contains all of the necessary definition for conditions. And finally the source code module defined as scm.
[code language=”python”]
# Database access
from vizgrimoire.metrics.query_builder import SCMQuery
# Filters to apply
from vizgrimoire.metrics.metrics_filter import MetricFilters
# Let’s start playing with git activity metrics
import vizgrimoire.metrics.scm_metrics as scm
[/code]
This part of the code is an example of the instantiation of a database access, where a predefined database is used. In this case, this is taken from the OpenStack activity board that is publicly available. As indicated in the options, there are two databases to be defined: the source code and the identities containers. Although in this example the same database is specified, at some point those two databases should be different and the identities and affiliations information will be separated from the rest of the schemas.
[code language=”python”]
# Instantiate database access
# Playing with OpenStack source code database (MySQL) at
# http://activity.openstack.org/dash/browser/data/db/source_code.mysql.7z
# Database named as openstack_source_code_fosdem2015
user = "root"
password = ""
source_code_db = "openstack_source_code_fosdem2015"
identities_db = "openstack_source_code_fosdem2015"
dbcon = SCMQuery(user, password, source_code_db, identities_db)
[/code]
Filters are specified in different ways. We need at least to define three parameters: the period of analysis (monthly, daily, weekly, etc), the initial, and the final date of analysis. On top of that, two extra filters are defined, one of them containing conditions to filter data by an organization. And the second one where information will be filtered by an organization and by a repository.
[code language=”python”]
# Instantiate some filters to play with
period = MetricFilters.PERIOD_MONTH
startdate = "’2014-01-01’"
enddate = "’2015-01-01’"
# basic filter
filters = MetricFilters(period, startdate, enddate)
# company filter
filters_company = MetricFilters(period, startdate, enddate)
filters_company.add_filter(MetricFilters.COMPANY, "Red Hat")
# company and repo filter
filters_repo_com = MetricFilters(period, startdate, enddate)
filters_repo_com.add_filter(MetricFilters.COMPANY, "Red Hat")
filters_repo_com.add_filter(MetricFilters.REPOSITORY, "nova.git")
[/code]
So, let’s start! First, the metric API provides four methods:
- get_agg: provides aggregated information. Eg: number of commits between two dates.
- get_ts: provides a timeseries with date information. Eg: number of commits between two dates in a monthly basis.
- get_trends: provides trends information. Eg: difference of number of authors between this year and the previous one.
- get_list: provides a list of elements of the selected metric. Eg: top contributors for the last year.
[code language=”python”]
# Retrieving data for each filter.
# Let’s start with commits
commits = scm.Commits(dbcon, filters)
commits.get_agg()
commits.get_ts()
[/code]
Thus, a simple way to visualize the total activity in the OpenStack Foundation in 2014 could be done using the following piece of code:
[code language=”python”]
plot(commits.get_ts()["commits"])
[/code]
In addition, it is possible to filter such data, to check activity only from a given organization. Let’s use Red Hat as a potential organization for this example.
Leave a Reply