The EU Datathon is an annual competition which provides ‘a chance for open data enthusiasts and application developers from around the world to demonstrate the potential of open data, get international visibility for their innovative ideas and compete for their share of the total prize fund of €200,000 and the Public Choice Award.’ They are invited to make use of data.europa.eu, the official portal for European data, managed by the Publications Office of the European Union.
With the Europeana.eu dataset published on data.europa.eu earlier this year, aggregating metadata from the approximately 4,000 cultural heritage institutions that provide content to Europeana, proposals and apps designed for the competition could also benefit from it for their entries. As an official partner of the competition, Europeana invited researchers, university professors and students from Social Sciences and Humanities, and Computer and Information Science to take part in the EU Datathon.
After two rounds of pre-selections of 156 entries from 38 countries, a team that is developing an app based on the Europeana.eu dataset was one the 12 finalists and was awarded a prize of 7,000 euros under Challenge Number 4: ‘A Europe Fit for the Digital Age’ at the award ceremony that took place in Brussels on 20 October 2022. The team is composed of Professor Johanna Monti; researcher, Maria Pia di Buono; and two PhD students, Gennaro Nolano and Giulia Speranza. Johanna Monti tells us about the experience.
Can you tell us about the app that you developed and the process of creating it?
We developed Maggie, a real-time chatbot that functions as a virtual assistant to help people access and discover European cultural content. People can interact with Maggie through natural language questions and ask about European cultural heritage.
The main idea behind Maggie is exploiting Artificial Intelligence (AI) and Natural Language Processing (NLP) methodologies to develop an user centric app which facilitates the access and discovery of multilingual cultural content. The intended audience of Maggie is very diverse; the app tailors content on users’ knowledge and interests to satisfy different information needs, from students to experts.
Maggie is the result of more than a decade of research activities which began in 2012 with our very first experiments in Cross-Language Information retrieval on Cultural Heritage. After that, several milestones marked our way to Maggie, including the establishment of the UNIOR NLP Research group of the University of Naples L'Orientale in 2016, and several several projects from 2019 until 2021, including the SMACH Project (Semantic Multilingual Access to Cultural Heritage), the ArchaeoTerm project which offers a resource of archaeological terms available within the framework of YourTerm CULT project, and the NEAT (Named Entities in Archaeological Texts) project.
Why did you decide to use the Europeana.eu dataset?
Our research group has always been committed to making cultural content easily accessible for everyone, by developing systems and applications for cultural heritage. In this sense, we have already exploited European open data (in the form of data from the Europeana website) in several works, all aimed at improving current state-of-the-art in Natural Language Processing tasks for better access to cultural heritage content.
In all these cases, the core of the data we used was represented by open data scraped from the Europeana Search API, which makes it easy for aggregated data to be accessed and reused, while also ensuring the high quality of the data, and their multilinguality. While in previous experiments much of the information described by the Europeana Data Model (such as data about localisation, authors and themes) was not used, to develop Maggie, we fully exploit the rich source of information offered by Europeana, as we aimed to develop a more specific Natural Language Processing task.