Welcome to the first chapter of the Metric of the Month! We’re glad to start this series talking about metrics. We expect to show you a complete guide for a different metric each month so you can understand more about them. We start this series with our favorite one, the Elephant Factor.
This term was coined by Bitergia and presented at OSCON 2015. It was inspired by the ‘Pony Factor’ created by Daniel Gruno for the Apache Software Foundation. Gruno’s metric was created to understand the lowest number of committers whose total contribution constitutes the majority of the codebase. Instead of looking for the contributors doing the majority of the codebase, we were interested in showing the companies behind them. We already had a pony in the Open Source metrics family and in 2015 the elephant was born. Later that year, the metric was included in the eBook Applications & Microservices With Docker & Containers (by The New Stack) in what they called the “animal factors”, whose aim was to represent a holistic view of open source projects.
What is the distribution of work in the community?
The Elephant factor is determined as the lowest number of companies whose total contribution (in commits by their employees) constitutes the majority of the commits.
It is common to consider 50% as the threshold for constituting a majority. We then look for the the most active companies that combined made 50% of commits.
For example, a project with 8 contributing organizations who each contributed 12.5% of the commits in a project would, if the Elephant Factor is parameterized at 50% to be the majority, have an Elephant Factor of “4” because 4 times 12.5% is 50%. If one of those organizations was responsible for 50% of commits in the same scenario, then the Elephant Factor would be “1”.
Here, we can see that up to 6 companies, from company A to company F, are contributing to the 50% of the commits. That is the half pie. So here, the Elephant Factor is 6.
Goals of this Metric
The analysis of the Organizational Diversity for Open Source projects and communities is not possible without The Elephant Factor. By using this factor it is possible to distinguish between projects and communities led by a small set of companies and the ones led by a more diverse structure.
Projects with a high Elephant Factor will be more resilient to any type of change in the strategy of the companies involved, as none of them is doing the majority of the contributions. In the opposite scenario, projects with a low Elephant Factor (where 1 is the minimum) are very dependent on a very small set of companies.
As an example, let’s imagine that your favorite company is the first contributor to two different projects. The first project, it is providing the majority of the contributions. That means that the Elephant Factor for that project is 1. The same company in a different project is not contributing the majority of the changes by itself. It needs the second and third companies to reach the majority of the contributions. The Elephant Factor for the second project would be 3. In case the strategy of your favorite company changes and the contribution to those projects is stopped, the first project would be much more affected than the second one.
As we always emphasize, metrics always need interpretation. It is even possible to think about scenarios where having a low Elephant Factor could be interesting for some organizations. The takeaway from the example above is that you have metrics to help you to identify the organizational diversity of a community. Now it is your turn to think about your business goals, if any of them requires knowing how diverse is a project then start using our beloved Elephant Factor.
The calculation of the metric by itself is simple. The complexity here is hidden by the platform that collects and manages the information about the contributions and the companies they belong to. In case you are interested in calculating this metric, what you need is a percentage calculation that will be used as our threshold. If you are looking at a pie-chart of how much companies have contributed, then the Elephant Factor is the minimum number of slices that cover 50% of the pie.
If we have 8 organizations who each contribute the following number of commits to a project: 1000, 433, 343, 332, 202, 90, 42, 33, then we can determine the elephant factor by first identifying the 50% of total commits for all the companies.
Summary: 50% of total contributions = 1,237.5, so the elephant factor is 2.
The calculation of the Elephant Factor will change depending on the time period selected. It will be different if you compare the history of a 10-year project with its last year. Our recommendation is to calculate a medium size period of time such as “last year” and use the time selector to see its evolution over the past years.
Watch a short video about the Elephant Factor and real examples applying this metric on a dashboard:
Where can I find this metric?
GrimoireLab and Bitergia Analytics provides this metric out of the box, not as a single number but as a visualization: a pie-chart as shown in the example above
- View an example on the CHAOSS instance of Bitergia Analytics.
- Download and import a ready-to-go dashboard containing examples for this metric visualization from the GrimoireLab Sigils panel collection.
- Add a sample visualization to a dashboard following these instructions:
- Create a new Pie chart
- Select the git index
- Metrics Slice Size: Count Aggregation
- Buckets Split Slices: Terms Aggregation, author_org_name Field, metric: Count Order By, Descending Order, 500 Size
Want to know more about Bitergia Analytics Platform?
- Ebook: Applications & Microservices With Docker & Containers (by The New Stack)
- CHAOSS Metrics: https://chaoss.community/metric-elephant-factor/
- OSCON 2015 presentation
Next metric chapter: did you say something about a Pony?
In the next Metric of the Month chapter, we’ll continue talking about contributions to projects, but this time from individual contributors and their diversity of them in the project. This is called the Pony Factor 🐴 and is defined as “The lowest number of contributors whose total contribution makes up the majority” Don’t forget to subscribe to the exclusive Metric Newsletter to get your next edition first!
Did you like this first edition? Give us your opinion in the comments or share it on social media. And every suggestion or comment is welcome, the Owl wants to share their knowledge with you!
- Goodbye Cauldron Cloud, join us on Bitergia Analytics!
- Leaving Developers: Metric of the Month, February 2023
- Daniel, a new Bitergian, joins the nest – Introducing our DevOps Engineer.
- Ruth, a new Bitergian, joins the nest – Introducing our Open Source Consultant.
- Give Credit Where Credit Is Due: Identify Contributors From Commit Messages
- Attracted Developers: Metric of the Month, January 2023