[This post is part of the lightning talk presented at FOSDEM 2015. The talk was titled as “Data, data and data about your favourite community” whose slides are available in the Bitergia’s Speakerdeck place. The ipython notebook used for visualization purposes is accesible through nbviewer and can be downloaded in GitHub. This is a basic introduction to GrimoireLib.]
GrimoireLib aims at providing a transparency layer between the database and the user. This helps to avoid the direct access to the databases while providing a list of available metrics.
This is a Python-based library and expects an already generated database coming from some of the Metrics Grimoire tools. CVSAnalY, MailingListStats, Bicho and most of the tools are already supported by this library.
The following piece of code imports the needed modules to start playing with GrimoireLib. Each of the metrics or studies are always instantiated in the same way: a database connection object and a set of predefined filters. In this example, SCMQuery is the module to access the database, MetricFilters module contains all of the necessary definition for conditions. And finally the source code module defined as scm.
# Database access from vizgrimoire.metrics.query_builder import SCMQuery # Filters to apply from vizgrimoire.metrics.metrics_filter import MetricFilters # Let's start playing with git activity metrics import vizgrimoire.metrics.scm_metrics as scm
This part of the code is an example of the instantiation of a database access, where a predefined database is used. In this case, this is taken from the OpenStack activity board that is publicly available. As indicated in the options, there are two databases to be defined: the source code and the identities containers. Although in this example the same database is specified, at some point those two databases should be different and the identities and affiliations information will be separated from the rest of the schemas.
# Instantiate database access # Playing with OpenStack source code database (MySQL) at # http://activity.openstack.org/dash/browser/data/db/source_code.mysql.7z # Database named as openstack_source_code_fosdem2015 user = "root" password = "" source_code_db = "openstack_source_code_fosdem2015" identities_db = "openstack_source_code_fosdem2015" dbcon = SCMQuery(user, password, source_code_db, identities_db)
Filters are specified in different ways. We need at least to define three parameters: the period of analysis (monthly, daily, weekly, etc), the initial, and the final date of analysis. On top of that, two extra filters are defined, one of them containing conditions to filter data by an organization. And the second one where information will be filtered by an organization and by a repository.
# Instantiate some filters to play with period = MetricFilters.PERIOD_MONTH startdate = "'2014-01-01'" enddate = "'2015-01-01'" # basic filter filters = MetricFilters(period, startdate, enddate) # company filter filters_company = MetricFilters(period, startdate, enddate) filters_company.add_filter(MetricFilters.COMPANY, "Red Hat") # company and repo filter filters_repo_com = MetricFilters(period, startdate, enddate) filters_repo_com.add_filter(MetricFilters.COMPANY, "Red Hat") filters_repo_com.add_filter(MetricFilters.REPOSITORY, "nova.git")
So, let’s start! First, the metric API provides four methods:
- get_agg: provides aggregated information. Eg: number of commits between two dates.
- get_ts: provides a timeseries with date information. Eg: number of commits between two dates in a monthly basis.
- get_trends: provides trends information. Eg: difference of number of authors between this year and the previous one.
- get_list: provides a list of elements of the selected metric. Eg: top contributors for the last year.
# Retrieving data for each filter. # Let's start with commits commits = scm.Commits(dbcon, filters) commits.get_agg() commits.get_ts()
Thus, a simple way to visualize the total activity in the OpenStack Foundation in 2014 could be done using the following piece of code:
In addition, it is possible to filter such data, to check activity only from a given organization. Let’s use Red Hat as a potential organization for this example.
# Let's use another filter commits_redhat = scm.Commits(dbcon, filters_company) commits_redhat.get_agg() plot(commits_redhat.get_ts()["commits"])
Or we can go a step further and check activity for a given organization in a specific repository.
# Let's focus on an organization and a repository commits_redhat_nova = scm.Commits(dbcon, filters_repo_com) commits_redhat_nova.get_agg() plot(commits_redhat_nova.get_ts()["commits"])
Although this post has focused only on commits, there are dozens of metrics and studies that can be used in the same way from several data sources: source code, issue tracking system, mailing lists, irc channels and others. More information is available at the GrimoireLib repository. If you are interested on specific training about these tools, just let us know😉