Brussels / 1 & 2 February 2025

schedule

Active Tigger: Accelerating Collaborative Text Annotation for Social Sciences and Beyond


This presentation introduces Active Tigger, an open-source research tool designed to accelerate collaborative text annotation in the social sciences.

The increasing use of text-as-data in social science research has created a pressing need for efficient annotation tools. While small datasets can be manually annotated, the exponential growth in available textual data (e.g., from newspapers and social media) demands solutions that enable collaborative annotation and automation. Moreover, the emergence of generative AI and large language models (LLMs) has highlighted the importance of robust corpus annotation practices, particularly for evaluating prompt-engineered outputs from LLM-as-a-service platforms like OpenAI or Hugging Face.

To address these challenges, we created an annotation platform, Active Tigger. A first version was developed in 2022 using R and RShiny (J. Boelaert, GitLab Repository). This tool embeds several annotation heuristics, including active learning—iteratively predicting and selecting annotations to maximize training quality—to help researchers build training datasets in order to fine-tuning encoder models. The tool quickly became integral to the research team's activities and beyond, which incited us to develop of a second, more robust version.

The current iteration of Active Tigger, built with a Python-based API and a React frontend, introduces enhanced flexibility and scalability. It supports collaborative workflows, accommodates a broader range of use cases, and is now in beta testing, with early adopters exploring its potential.

This presentation will cover three key aspects:

The journey of Active Tigger: From addressing specific social science needs to adapting to the evolving landscape of LLMs. Showcase: Demonstrating the annotation workflow using active learning and BERT fine-tuning. Future directions: Exploring the tool's evolution in the context of widespread LLM availability, discussing the trade-offs between focusing on specialized tasks and enabling broader applications.

Github repository of Active Tigger : https://github.com/emilienschultz/activetigger

Speakers

Photo of Emilien SCHULTZ Emilien SCHULTZ

Links