How did you get involved with the project?
My colleague and project coordinator Maud Ehrmann asked me to join the project in the summer of 2017, when an unexpected change in the project team opened up the possibility of having another post-doc researcher to support her in the tasks that the DHLAB was leading. At that time, I was working on Linked Books, another SNF-funded project on citation mining of scholarly literature about the history of Venice. The work on named entity processing and disambiguation that we are carrying out in impresso is at the core of my research interests. There is also a continuity with Linked Books and my previous research on information extraction from large-scale digital archives in the Humanities, with citations (and more generally named entities) being one of my main areas of interest.
What is the importance of newspaper datasets for historical research?
Historical newspapers are invaluable primary sources for humanities scholars at large, not only historians. In fact, they contain and preserve a kind of fossilised trace of our current and past societies. They record all kinds of events, from war declarations to Saturday evening dancing balls in the countryside, and they document many aspects of day-to-day life and culture. They contain extremely rich and dense information, which is also continuous as in many cases these newspapers have been running for a long time and published on a very regular basis.
A crucial challenge that we are addressing in impresso is how to devise a tool that supports researchers to work with large archives of digitised newspapers. The tool integrates natural language processing technologies (e.g. named entity processing or topic modelling) to capture the semantics of newspaper contents, in order to make these (enhanced) sources usable for research. An important principle we are following in its design is transparency, meaning we strive to make explicit and visible to users all aspects of the data - or of the processing we perform on the data - that often risk remaining hidden in search interfaces. Information aspects we want to make more transparent include, for example, OCR quality, as well as holes in the data due to damaged digital archives.
How are impresso tools being used?
Despite the fact that the impresso project is still in the making, its corpus and tools are actively being used both for research and teaching.
On the research side, Dr. Estelle Bunout (C2DH) - one of the (digital) historians in our project - is working on a case study entitled ‘Resistance to Europe’ which involves the analysis of debates on the European idea in digitised newspapers from Luxembourg, Switzerland and beyond, with the aim of identifying tensions around the European idea from the late 19th century to 1945. And researchers from our associated partners, the Infoclio association and the University of Lausanne’s History Department, are contributing to the reflection on how to apply impresso tools to historical research questions in the context of concrete use cases.
Finally, we issued a Call for Associated researchers during the first year of the project in order to extend the circle of historians affiliated to the project. As a result, about 20 historians mainly from Benelux, France, Germany and Switzerland expressed their interest in both the tools and the collections brought together by impresso and have got involved in the project. Their association entails not only the use of the project’s output but a regular dialogue with the impresso team, via workshops and a final conference aiming at collecting feedback on their use of impresso tools and their research, and at discussing epistemological issues raised by digitised newspapers.
The diversity of topics and methods of the associated researchers reflects the Swiss and Luxembourgish (digitised) newspapers’ allure as historical sources. They include prosopographical research on experts and female war correspondents, as well as on ‘history of thoughts’ such as the rise of liberal internationalism at the end of the 19th century, or banking history. Each of these research topics requires a particular use of the newspapers, a particular way to query them that contributes to fuel the conception of the interaction with the impresso collection. The diverse uses are however made available for all the researchers in the same interface, in an effort to offer a diversification of these interactions and enrich every type of research practice, including also teaching practices, in the spirit of the generous interfaces.
On the teaching side, Martin Grandjean and Sandra Bott have been using part of the impresso corpus in teaching a Digital Humanities/Digital History course, part of the EPFL’s Social and Human Sciences programme. The course focuses on how the big events of the 20th century were covered in the press; digital archives of newspapers provide the students with a rich source of materials on which a range of digital methods and tools can be tested. The same course is planned for next year and it will be based on the impresso interface and tools, thus allowing us to test the strength and weaknesses of these tools specifically in a teaching (rather than research) context.
In the frame of Ranke2, the platform prepared at the C2DH offering teaching materials on how to practice digital source criticism, the impresso project contributes with the preparation of a module dedicated to the use of digitised newspapers. This module harvests the lessons learned with preparing a transparent interface, adapted to bachelor level and secondary school teaching, bringing the latest trends of research practices to the classrooms.