About the call
Methods from the field of Artificial Intelligence and Machine Learning (AI/ML) have helped push technological boundaries in various domains, including in the cultural heritage sector (see examples in the Interim Report of the EuropeanaTech AI in relation to GLAMs Task Force or the AI4LAM initiative).
Many AI/ML methods of interest to applications in GLAMs are supervised; for example, they work by training a predictor (like a neural network) using ground truth (ideal and expected outputs) or labeled data, from which the method is able to learn and infer a model. In order for the model to generalise well and perform accurate predictions for a wide array of inputs, its training data need to be of sufficient volume, quality and be representative for the domain from which it is sampled. Otherwise, there is a risk of overfitting (the model will only make good predictions for inputs that are very similar to the training data) or the introduction of biases, which will not only reduce the model’s general applicability and performance, but can also entail ethically problematic or otherwise unintended side-effects.
The GLAM sector is well positioned for the takeup of AI/ML in the sense that curated and diverse data of sufficient volume, quality and diversity in the form of digital collections from GLAMs (such as those aggregated and provided by Europeana), are now widely available under open licenses. What is currently lacking is the wider availability of datasets from the GLAM sector that are appropriate for direct use in the context of AI/ML research and development. The availability of such open datasets could not only help foster more engagement with digital cultural heritage data in AI/ML, but also support the transfer of recent advances in AI/ML to the field of digital curation and analysis of cultural heritage content. On the other hand, further advances in AI/ML often go hand in hand with the release of new high-quality datasets.
EuropeanaTech therefore invites proposals for the assembly of suitable AI/ML datasets, drawing from the extensive collections on the Europeana website. We are seeking proposals for the creation of large, well-documented datasets that are shaped for direct takeup for AI/ML purposes (such as training a model) and that can be made publicly available on relevant online platforms under open licenses.
We will award the two winning proposals a financial stipend of €2,500 to support the production, documentation and publication of the datasets. Award winners will be invited to present their contributions at a future Europeana (online) event and provide a text for publication related to their outputs.
How to apply
To apply, please read the submission guidelines below and submit a proposal by 15 February 2021, 23:59 CET. Proposals should describe in less than 1,500 words:
The intended contents of the dataset (in terms of volume, types of assets, annotation, etc.)
The procedure you intended to follow for producing the dataset
How it is relevant for AI/ML.
Proposals should also include a suggestion for a possible use case, supported by a pre-trained model with a demonstration or evaluation of its results. In case of acceptance, it must be feasible to produce and release the dataset and all necessary documentation and technical resources before 30 June 2021.
European cultural heritage collections are commonly subject to biases and entail ethical issues. While this can negatively impact AI and machine learning solutions, AI and machine learning could also be used to uncover these issues. These issues might not be overcome within the scope of this call, but we advise you to document and discuss them.