Measuring the Health of Open Source Communities
Isabella Ferreira, Mark Shan
Published at 07/19/2021
blog
Views 3884

If you manage or want to be part of an open source project, you might have wondered how to know if the project is healthy or not. You could choose to analyze different aspects of the project, such as the technical health (such as number of forks on GitHub, number of contributors over time, and number of bugs reported over time), the financial health (such as the donations and revenues over time), the social aspects (such as social media mentions, post shares, and sentiment analysis across social media channels), and diversity and inclusion aspects (such as having a code of conduct, create event inclusion activities, color-blind-accessible materials in presentations, and in project front-end designs). The question is, how to measure such aspects? To know if a project is healthy or not, metrics should be computed and analyzed over time. Besides that, it is important to have such metrics in a dashboard to facilitate analysis and decision making, and that's what this article is about.

Why do metrics matter?

"The goal here is not to construct an enormous vacuum cleaner to suck every tiny detail of your community into a graph. The goal is instead to identify what we don't know about our community and to use measurements as a means to understand those things better."
The Art of Community - Jono Bacon

Open source software needs community. By knowing more about the community through different metrics, stakeholders can make informed decisions. For example, developers can select the best project to join, maintainers can decide which governance measures are effective, end-users can select the project that is more robust, will live longer, and prosper, and investors can select the best project to invest [1]. Furthermore, Open Source Program Offices (OSPO), i.e offices inside companies that aim to manage the open source ecosystems that the company depends on [5], are able to assess the project's health and sustainability by analyzing different metrics. OSPO is becoming very popular due to the fact that around 90% of the components of modern applications are open source [6]. Thus, measuring the risks of consuming, contributing to, and releasing open source software is very important to OSPO [5].

How to define which metrics to evaluate?

  1. Set your goals: Measuring without a goal is just pointless. Goals are concrete targets to know what the community wants to achieve [3].
  2. Find reliable statistical sources: After defining your goals, you can then identify the sources that will help you to achieve your goals. It is important to find ways to get statistics on the most important goals to you [4]. Some statistics are obvious, such as on GitHub you can collect the number of stars, number of forks, and number of contributors to a repository. It is also possible to get mailing lists subscribers, and the project website visits. Some statistics are not so obvious though, and you might need tools to help to extract such numbers.
  3. Interpret the statistics: Interpret the statistics in terms of people, product, process, and partners [4]. That is, look at the numbers mostly related to the people in the community, such as contributors' productivity, which channels have the most impact, etc. Then, look at the velocity and maturity of your project, such as the number of PRs, and number of issues. After that, look at the maturity for the purposes of your process, i.e., what's your review process? How long does it take to solve an issue? Finally, look at the ecosystem view in terms of your partners. That is, statistics on project dependencies, and projects that depend on you.
  4. Use dashboards to evaluate your metrics: Many existing tools help to create dashboards with the purpose of analyzing and measuring open source communities healthiness, such as LFX Insights, Bitergia, and GrimoireLab.
  5. Make changes: After measuring, it is necessary to make changes based on those measurements.

Learning from examples

Different projects use different strategies to measure the project's health.

The CHAOSS Community creates analytics and metrics to help understand project health. They have many working groups, each one focusing on a specific kind of metric. For example,

  • The Diversity and Inclusion working group focuses on the diversity and inclusion in events, how diverse and inclusive the governance of a community is, and how healthy the community leadership is.
  • The Evolution working group creates metrics for analyzing the type and frequency of activities involved in software development, the processes to improve the project quality, and the community growth.
  • The Value working group creates metrics for identifying the degree to which a project improves people's lives beyond the software project, the degree to which the project is valuable to a user or contributor, and the degree to which the project is monetarily valuable from an organization point of view.
  • The Risk working group creates metrics to understand the quality of a specific software package, potential intellectual property issues, and understand how transparent a given software package is with respect to licenses, dependencies etc.

The Mozilla project collaborated with Bitergia and Analyse & Tal to build an interactive network visualization of Mozilla's contributor communities. By visualizing different metrics, they were able to find that Mozilla has not only one community, but many communities concerning different areas of contributions, motivations, engagement levels etc. Based on that, they build a report to visualize how these different communities are interconnected.

Many projects such as Kubernetes and TARS, use the LFX Insights tool to analyze their community.
The LFX Insights is a dashboard that helps project communities to evaluate different metrics concerning open source development in order to grow a sustainable open source ecosystem. The tool has different features to support different stakeholders [2], such as

  • Maintainers and project leads can get a multi-dimensional reporting of the project, avoid maintainer burnout, ensure the project's health, security, and sustainability.
  • Project marketers and community evangelists can use the metrics to attract new members and engage the community as well as identify opportunities to increase awareness.
  • Members and corporate sponsors can know which community and software to engage in, communicate the impact within the community, and evaluate their employees' open source contributions.
  • Open source developers can know where to focus their efforts, showcase their leadership and expertise, manage affiliations, and their impact.

Furthermore, a variety of metrics can be extracted from LFX Insights. From the source code repository, metrics such as the number of commits in total and by contributor, the number of contributors, the top contributors by commits, and the companies that mostly contribute to the project. Pull requests (PRs) can be extracted from many tools such as Gerrit and GitHub. Similarly to commits, the number of PRs in total, by contributor, and by company. The tool also calculates the average time to review the PR, and the PRs that are still to be merged. You can also extract metrics for issues and continuous integration tools. Besides that, LFX Insights allows projects to collect communication and collaboration information from different communication channels such as mailing lists, Slack, and Twitter.
Projects might have different goals when using LFX Insights. The TARS project, part of the TARS Foundation, uses the LFX Insights tool to have a big picture of each sub-project (such as TARSFramework, TARSGo etc). Through the dashboards created by the LFX Insights tool, the TARS community can know the statistics of each individual project as well as the community as a whole (see Figure 1 and 2). By using LFX Insights tools, the TARS community analyzes how many people are contributing to each project as well as which organizations contribute to TARS. Additionally, they extract the number of commits and lines of code contributed by each contributor. The TARS community believes that by analyzing such metrics they can create means to attract and retain more contributors.

image.png
Figure 1: Example of the TARS project using the LFX Insights tool. The numbers are aggregated according to a snapshot in time.

image.png
Figure 2: Example of the dashboard of the LFX Insights tool used by the TARS project

TLDR-summary

Tracking different types of metrics is extremely important for free and open source communities. Metrics give project insights into specific efforts and help to get a feel of the general perception of the community. For that, tools that can pull data from a variety of sources and develop a visualization of this data will help projects to make informed decisions.

About the authors

Isabella Ferreira is an Ambassador at TARS Foundation, a cloud-native open-source microservice foundation under the Linux Foundation.

Mark Shan is the Chair at Tencent Open Source Alliance and also Board Chair of TARS Foundation Governing Board.

REFERENCES

[1] Jansen, Slinger. "Measuring the health of open source software ecosystems: Beyond the scope of project health." Information and Software Technology 56.11 (2014): 1508-1519.
[2] https://www.youtube.com/watch?v=hwTOrDg3LsI
[3] https://opensource.com/bus/16/8/measuring-community-health
[4] https://dzone.com/articles/-measuring-metrics-in-open-source-projects
[5] https://opensource.com/article/20/5/open-source-program-office
[6] https://fossa.com/blog/building-open-source-program-office-ospo/