Online / 6 & 7 February 2021


Gazouilloire: a command line tool for long-term tweets collection

Many open-source libraries provide an interface for the Twitter API. However, most people use these tools in temporary scripts for a one-time tweets collection. Moving to a robust application for collecting and indexing tweets over long periods of time requires some programming knowledge that most social science researchers do not master. In order to meet this need, the medialab has developed gazouilloire, a tool that makes it possible to easily configure the collection parameters (keywords searched, language of tweets, location of tweets, etc.) and can then be launched from the command line.

Gazouilloire combines two methods to collect tweets from the Twitter API ("search" and "filter") in order to maximize the number of collected tweets, and automatically fills the gaps in the collection in case of connexion errors or reboots. It also provides a large range of features that are not directly available from the free Twitter API: collecting during specific periods of time, resolving redirected urls, downloading only certain types of media contents (only photos and no videos, for example) or unfolding Twitter conversations. The user can then choose to export the tweets in csv format, and select the fields that will form the columns of the table.

Social science researchers have already used gazouilloire for a wide variety of studies: measuring online activity during COVID-19 lockdown, studying the public discourse of anti-vaxxers, or monitoring urban nature policies, among many others...


Photo of Béatrice Mazoyer Béatrice Mazoyer