HAICu, a project to access, link and analyse digital heritage collections using artificial intelligence, received a €10.3 million grant from the Dutch Research Council in 2023 and kicked off in February 2024. Jelle Posthuma, Impact/Science journalist for the Jantina Tammes School of Digital Society, Technology and AI speaks to Rosemarie Van der Veen-Oei, head of the Research Department at the National Library of the Netherlands (KB), consortium partner, about the project.
A legacy from ‘CATCH’
HAICu’s origins stem largely from Continuous Access to Cultural Heritage, or CATCH, a 15-year research programme funded by the Dutch Research Council and the Dutch Ministry of Education, Culture and Science.
'HAICu is not based on CATCH, but many people from the same community are involved,' says Van der Veen-Oei. 'In the Netherlands, CATCH made an important contribution to research at the intersection of IT and heritage institutions. Later, the humanities were added. CATCH's goal was to make digital collections accessible.'
Building on a unique partnership
The PhD students funded by CATCH were stationed at heritage institutions two days a week, says Van der Veen-Oei. As a result, they worked outside an academic environment, and over a period of 15 years, this resulted in PhDs and postdocs with experience and knowledge about the heritage sector, digital collections and the academic world. With this collaboration, heritage institutions took their first steps into academia. 'It was a unique partnership between academia and heritage institutions, and it produced a new kind of knowledge and expertise for both institutions.'
The project was the first time that many of the heritage institutions cooperated closely with IT researchers. Van der Veen-Oei continues, 'IT researchers, in turn, had access to the digital collections of heritage institutions for the first time. They suddenly had large amounts of data at their disposal to train their tools and algorithms.'
A following project, CATCHPlus, looked to find ways to convert the prototypes or demos into tools or instruments for practice. In the end, not all developed prototypes or demos were actually put into use by the heritage institutions, and connecting digital collections had not yet been completed. 'With HAICu, we want to go one step further. This time, however, with the application of AI techniques.'
Bringing in artificial intelligence
Researchers and professionals from the heritage institutions involved in CATCH wanted to carry on with the community the project built up, bringing in new AI techniques. Van der Veen-Oei notes that ‘AI has been developing for decades, but today AI can also give meaning to collections in a responsible way. We wanted to use these developments intelligently to access, link and analyse our collections.'
Heritage institutions like the National Library of the Netherlands (KB) are facing a surge in new data. The KB currently has about three petabytes (three billion megabytes) of digital data, says Van der Veen-Oei. By 2027, the library expects to host over five petabytes of data. To illustrate, one petabyte of information corresponds to a 1.8-kilometre-high tower of stacked CD-ROM discs without a box. 'For that, we need new tools and techniques to make it easy and simple.' That’s where the HAICu project comes in.
New perspectives
Artificial intelligence also brings new perspectives to collections. 'What used to be quite normal in the past is sometimes not acceptable nowadays. We can use AI to show multiple perspectives. Take the term Zwarte Piet (Black Pete), which appears in book titles in the past but is subject to debate these days. We as KB are looking for insights and ways to automatically detect this metadata bias.'
At the same time, AI provides the data with context. 'It's about how we can use heritage data to provide a transparent and trustworthy reflection on reality. While searching, you are provided with ideas: have a look in this collection, or or this one. In addition, sources are placed in context. Innovation labs are used to test new developments within HAICu.’
Multimodal approaches
In the past two years, generative AI has grown in prominence through systems like ChatGPT. 'At HAICu, we want to use this search method for the collections of heritage institutions as well.'
Multimodality, or using different sources, plays a big role.
'Within HAICu, the collections of different heritage institutions are linked. It is not just about text, but also about video, audio and so on. Take the dataset Delpher, a website providing full-text Dutch-language digitised historical newspapers, books, journals and copy sheets for radio news broadcasts. How wonderful it would be if we could connect the scans of newspapers with audiovisual material from the Netherlands Institute for Sound and Vision (Beeld & Geluid), for example. In one search, you can gather all the information. That is what HAICu is all about.'
Find out more
HAICu aims to go beyond simply developing and utilising AI techniques and tools. The project intends to promote interdisciplinary and institutional collaboration through innovation labs and citizen science projects. These initiatives will engage those who may not be actively involved in HAICu currently. With these efforts, HAICu expects to provide a fertile breeding ground for input and curatorial services from all stakeholders. In addition, the consortium is motivated to ensure the long-term integration of HAICu outcomes into partner organisations and their networks.
Does this project pique your curiosity? Check the HAICu’s website for upcoming vacancies and updates, among others an extensive interview with one of the HAICu project leaders.
To stay up to date with the latest developments in research and development in the cultural heritage sector, network with peers and collaborate, join the EuropeanaTech Community today!