King Arthur: Driving Quests on Software Project Data

King Arthur, or simply Arthur, is an open source tool designed to schedule Perceval executions at scale through distributed Redis queues. It also provides support to store the data obtained to an ElasticSearch database, thus giving the possibility to connect the results with analysis and/or visualizations tools, such as the Bitergia analytics dashboards.

 

Executing Perceval through Arthur

arthur-4icon

The figure above highlights the overall view of Arthur. At its heart there are two components: the server and one or more instances of workers, in charge of running Perceval executions, which are defined as a list of tasks. The server waits for HTTP requests, which allow to add, delete or list tasks using REST API commands (i.e., add, remove, tasks). The listing below depicts how to send commands to the Arthur server.

# Adding tasks
$ curl -H "Content-Type: application/json"
--data @to_add.json http://127.0.0.1:8080/add
# Removing tasks
$ curl -H "Content-Type: application/json"
--data @to_remove.json http://127.0.0.1:8080/remove
# Listing tasks
$ curl http://127.0.0.1:8080/tasks

As can be seen, adding and removing tasks requires specific parameters, sent as JSON data within the request. Adding a task needs a JSON object that contains a task id (useful for deleting and listing operations), the parameters needed to execute a Perceval backend, plus other optional parameters to control the scheduling (i.e., delayed start, maximum number of retries upon failures) and archive the fetched data. Conversely, in order to remove a task, the JSON object must contain the identifier of that given task.

The listing below shows two examples of JSON objects to include and delete Perceval tasks. The former allows to extract commit data from the Perceval repository, while the latter removes the task perceval.git from Arthur.

# Adding task
{
  "task_id": "perceval.git",
  "backend": "git",
  "backend_args": {
      "gitpath": "/tmp/git/perceval.git/",
      "uri": "https://github.com/chaoss-grimoirelab/perceval.git",
      "from_date": "2015-03-01"
  },
  "archive": {},
  "category": "commit",
  "scheduler": {
     "delay": 10,
     "max_retries": 5
  }
}
# Removing task
{"task_id": "perceval.git"}

After receiving a task, the server initializes a job with the task parameters, thus enabling a link between the job and the task, and sends the job to the scheduler. The scheduler manages two (in-memory) queues handling first-time jobs and already finished jobs that will be rescheduled. The former are Perceval executions that perform the initial gathering from a data source, while the latter are executions launched in incremental mode (e.g., from a given date, which is by default the date when the previous execution ended). In case of execution failures, the job is rescheduled as many times as defined in the scheduling parameters of the task.

Workers grant Arthur with scalability support. They listen to the queues, pick up jobs and run Perceval backends. Once the latter have finished, workers notify the scheduler with the result of the execution, and in case of success, they send the JSON documents to the server storage queue. Such documents are consumed by writers, which make possible to live-stream data or serialize it to database management systems. In the current implementation, Arthur stores the JSON documents to an ElasticSearch database.

Enhancing Arthur with Graal

Recently, Arthur has been extended to handle Graal tasks, which extract source code information by leveraging on existing code analysis tool. Similarly to Perceval tasks, adding and deleting Graal tasks is achieved by sending JSON objects to Arthur.

The listing below shows two examples of JSON objects to include and delete a Graal task. As can be seen, adding a task to analyze the code complexity of a repository consists of sending an add command to the Arthur server with a JSON object including a task id (cocom_graal), the parameters needed to execute an instance of the CoCom backend, such as its category (i.e., code_complexity), the URI of the target repository and the local path where it will be mirrored (i.e., uri and git_path), plus the scheduler settings. Deleting the cocom_graal task requires less effort, it suffices to send a remove command to the Arthur server that includes a JSON object with the target task.

# Adding task
{
  "task_id": "cocom_graal",
  "backend": "cocom",
  "backend_args": {
      "gitpath": "/tmp/git/perceval.git/",
      "uri": "https://github.com/chaoss-grimoirelab/perceval.git"
  },
  "archive": {},
  "category": "code_complexity",
  "scheduler": {
     "delay": 10,
     "max_retries": 5
  }
}
# Removing task
{"task_id": "cocom_graal"}

Try it out and Join the gang!

King Arthur is 100% open source, part of the GrimoireLab tools, used to fetch software project data. Feel free to try it, fork it, submit issues, pull requests and ideas!

If you are attending CHAOSSCon or OSSNA, come to visit us in Vancouver to get more details.

goodbye

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Blog at WordPress.com.

Up ↑

%d bloggers like this: