An interactive visualisation of the digitised newspapers

This is a guest blog by Sven Charleer, KU Leuven

Through The European Library website, Digital Humanities researchers are now given access to 10 million European digitised newspaper pages. While the availability and accessibility of this rich material are a great addition to the researcher corpus, the large amount of data can make it hard to find the specifics a researcher is looking for.

We gathered a group of Digital Humanities researchers in Amsterdam to collect ideas in a 1-day workshop on how we could improve the access to the data and what tools could improve the research workflow. A similar, shorter session was organised during the Europeana Cloud Plenary Meeting in Edinburgh. Ideas that came up inlcuded visualising sentiment analysis, how news moves through time and space, being able to compare queries, moving and continuing search results to personal digital spaces, sharing results with other researchers, dealing with language issues, spelling changes through time, visualising the precision of OCR, entity recognition etc. The list is quite long.

Our current prototype focuses on creating a faceted search environment through an interactive visualisation focussing on the time and space aspects. Following and "overview+details on demand" approach, the visualisation provides both an overview to allow researchers to find patterns in the data and gain insights across time and space, while also giving access to each individual newspaper image.

The prototype consists of 6 modules:

a text search widget: supports search on words and sentences in the OCR’d newspaper text;
a newspaper title widget: in order to restrict searches to specific newspapers;
a timeline widget: in order to restrict searches to a specific time frame while also visualising the number of newspapers in the search resultper year;
a map widget: enables a researcher to explore the distribution across Europe while also providing the ability to restrict a search to a specific country (note: due to the metadata lacking country information, language is currently used as a country indicator);
a search history widget: visualises the history of search terms/facet selections of the user;
a newspaper edition result widget: shows all results within the selection of the widgets above;
a newspaper view: shows the actual newspaper scan.

Created using Processing.js for the visualisations, Socket.IO for live communication with a Node.js server, any interaction with a single module updates all other modules, e.g. selecting a country adjusts the timeline to overlay the results of the selected country, selecting a specific time frame shows only the countries and newspapers that are relevant to that time selection. The whole application can run across multiple devices at the same time, enabling set-ups from a single tabletop device to multiple displays on mobile devices. All updates happen cross-device, wherever the devices are located.

Such a setup is very flexible: researchers can not only decide which modules they wish to use, but also how they wish to access them. Large displays can visualise all modules simultaneously, while multiple screens (e.g. multiple computer screens, large TVs, interactive tabletops, tablets and phones) can each provide access to 1 or more modules.

A researcher can decide to open multiple tabs in a browser to access the data on smaller screens. Researchers can share live searches, creating a co-located or even remotely shared faceted search environment. This also means the visualisation can be deployed in other settings such as a public library, using a public display where visitors can interact with the visualisation using personal devices.

We are currently in the usability testing phase, where we evaluate both the usability of the modules, but also the viability of the multiple screen setup. Deploying the visualisation on a large interactive tabletop as well as spreading it out on multiple tablets has already shown that faceted search performs equally well from a user point of view on both setups.

The results of these evaluations will let us improve the visualisation even further, after which we shall ask Digital Humanities researchers to join in and provide us with expert feedback. If you wish to be part of this process, do let us know!