Hi Clemens! Can you tell us about your day job and your role in Europeana Newspapers?
Clemens: I work at the Berlin State Library (Staatsbibliothek zu Berlin - Preußischer Kulturbesitz), where I advise the Directorate on research strategy and also participate in several research projects, mainly in the areas of Optical Character Recognition (OCR), Machine Learning and the Digital Humanities.
I have been involved with Europeana Newspapers from the start. When the Europeana Newspapers project received funding in 2012, I worked at the National Library of the Netherlands, and led the OCR processing of 10 million newspaper pages for the project. In the summer of 2014, I moved to the Berlin State Library to coordinate the overall project, until its end in 2015. Since then I have been working with Europeana to make the project outputs available as a thematic collection.
What and who is Europeana Newspapers for?
Newspapers capture the details of daily life in the past - a lot of what does not make it into history textbooks can be discovered in historic newspapers. There is great potential in having a common point of access for newspapers from different European countries, for example, for comparison and the study of how media in the past reflected public perceptions of major events such as the murder of Franz Ferdinand or the 1917 Revolution.
There are already a number of research projects that work with the Europeana Newspapers collection data in various ways, from text analysis to image-based methods or the study of historical stock market indices. But newspapers are also a very interesting source for creative coders, citizen scientists or genealogists or just anybody interested in the many details of life in the past.
Europeana Newspapers began life as a project of its own and now it's a thematic collection. Can you tell us a bit about its development?
When we began with the Europeana Newspapers project, Europeana couldn't provide full-text search in addition to metadata about cultural heritage objects. So a prototype portal was developed and served by The European Library (TEL). Unfortunately, CENL, the organisation that funded TEL decided to close the service by the end of 2016. Since then, we have been working with Europeana to save the newspaper data and access features by migrating to the main Europeana Collections platform.
However, TEL and Europeana use different technology to serve data, so most of the development had to start from scratch. Since the newspapers require specific functionality not present before in Europeana, and since the data is of immense volume, this proved to be quite difficult and time-consuming. Furthermore, the newspaper collection needed to blend in with the overall presentation of cultural heritage objects on Europeana Collections, which brought about additional challenges for design and development.