Methods from the field of Artificial Intelligence (AI) and Machine Learning (ML) have helped push technological boundaries in various domains, including in the cultural heritage sector (the Interim Report of the EuropeanaTech AI in relation to GLAMs Task Force and the AI4LAM initiative provide some examples). To encourage innovation in this area, a few weeks ago EuropeanaTech announced its first Challenge for Europeana AI/ML Datasets. With this new activity, we wanted to stimulate the creation of datasets for the GLAM sector that can be used for AI/ML, drawing from the rich cultural heritage resources available in Europeana. We hope that the availability of such datasets could help to foster more engagement with digital cultural heritage data in AI/ML and support the transfer of recent advances in AI/ML to the field of digital curation and analysis of cultural heritage content.
We received a total of five proposals, which were carefully reviewed by members of the EuropeanaTech Steering Group and AI in relation to GLAMs Task Force. They assessed the proposals based on their relevance for the GLAM sector (25%), relevance for AI/ML (25%), relation to Europeana (30%) and clarity of the description and work plan (20%).
Announcing the winners
Named Entities in Archeological Texts
This proposal from a team based at the University of Naples 'L'Orientale' aims to create a dataset for Named Entity Recognition (NER) and Term Extraction for archeological terms in Italian and English in the Europeana Archeology collection. NER is the process of identifying proper names such as person names or locations in unstructured text. Term Extraction is similar, but focuses on finding specialised terms, in this case from the archeology domain. Vocabularies like Getty and CIDOC CRM will be considered. The final dataset could be used in the development and evaluation of AI/ML based technologies for NER in the archeology domain.
Reviewers particularly appreciated the clear structure and maturity of the proposal, for which a mock dataset was already made using Europeana’s APIs to test the approach proposed. The bilingual aspect and the scarcity of similar open resources for the archeology field were also seen as particularly valuable.
Zac Grace
This proposal by a student of the Ecole Nationale d'Ingénieurs de Tarbes aims to create pixel masks for semantic segmentation, through manual annotation of image data in the Europeana Fashion collection. This means that, for example, when an image is analysed, the relevant fashion elements (shirt, trousers, shoes) in the image are then marked with their pixel outlines. Such data can be used for training an automated segmentation system.