Online / 6 & 7 February 2021


Metrics in Context: A Data Specification For Scholarly Metrics

Google Scholar, Web of Science, Scopus, Dimensions, Crossref,, ... What used to be the home turf of for-profit publishers has become a buzzing field of technological innovation. Scholarly metrics, not only limited to citations and altmetrics, come from a host of data providers using an even wider range of technologies to capture and disseminate their data. Citations come as closed or open data, using traditional text processing or AI methods by private corporations, research projects or NGOs. What is missing is a language and standard to talk about the provenance of scholarly metrics.

In this lightning talk, I will present an argument why we need to pay more attention to the processes of tracing and patterning that go into the creation of the precious data that determine our academic profiles, influence hiring and promotion decitions, and even national funding strategies. Furthmermore, I present the an early prototype of Metrics in Context, a data specification for scholarly metrics implemented in Frictionless Data. Additionally, the benefits and application of Metrics in Context is presented using both traditional citation data and a selection of common altmetrics such as the number of Tweets or FB shares.

In this lightning talk, I want to present Metrics in Context, a data specification implemented using Frictionless Data. It addresses a common theme within the critique of modern technology in our data-driven world: the lack of context for data and, often related, biases in databases. Algorithmic and database biases have moved into the spotlight of critical thought on how technology exacerbates systemic inequalities. Following these insights, I want to address the need for different (rather than simply more) context and metadata for scholarly metrics in the face of racial, gender, and geographic biases which plague modern academia.

It isn’t controversial to say that scholarly metrics have become an integral part of scholarship and probably they are here to stay. Controversy usually comes into play once we discuss how and for which purposes metrics are used. This typically refers to the (mis)use of citation counts and citation-based indicators1 for research assessment and governance, which also led to a considerable number of initiatives and movements calling for a responsible use of metrics. However, I would like to take a step back and redirect the attention to the origin of the data underlying citation counts.

These conversations about the inherent biases of citation databases are not entirely new and scholars across disciplines have been highlighting the consequential systemic issues. However, in this project I am not proposing a solution to overcome or abolish these biases per se, but rather I want to shine light on the opaque mechanism of capturing metrics which lead to the aforementioned inequalities. In other words, I propose to develop an open data standard3 for scholarly metrics which documents the context in which the data was captured. This metadata describes the properties of the capturing apparatus of a scholarly event (e.g., a citation, news mention, or tweet of an article) such as the limitations of document coverage (what kind of articles are indexed?), the kind of events captured (tweets, retweets, or the both maybe?) or other technicalities (is Facebook considered as a whole or only a subset of public pages?).

While metrics in context don’t remove systemic inequality, they make the usually hidden and inaccessible biases visible and explicit. In doing so, they facilitate conversations about structural issues in academia and eventually contribute to the development of better infrastructures for the future.


Asura Enkhbayar