About datasheets
A datasheet is a standardised publication format for documenting a dataset, providing the context needed for reusing the data. In 2018 Timnit Gebru et al introduced Datasheets for Datasets to the machine learning community. A datasheet encourages the creators of a dataset to carefully reflect on the provenance of the data. For potential users of a dataset, the datasheet provides information to make informed decisions about using the data.
The study says, ’Datasheets for datasets have the potential to increase transparency and accountability within the machine learning community, mitigate unwanted societal biases in machine learning models, facilitate greater reproducibility of machine learning results, and help researchers and practitioners to select more appropriate datasets for their chosen tasks.’
Cultural heritage data differs from contemporary, industrial data in a number of ways. Most importantly, digital cultural heritage collections are rarely created with the intention of being used as data. Most often they originate from non-digital objects that have been created for very different (cultural) purposes and were digitised at a later stage. They can be copyright protected, tend to be complex and heterogeneous, and might grow over time or contain sensitive content. All of this needs to be communicated to potential users. To facilitate communication between cultural institutions managing digital collections and all those interested in the reuse of cultural heritage datasets for academic and research purposes, datasheets for digital cultural heritage need to reflect these characteristics.
Scope of the Expert Group
The Europeana Research Community and the EuropeanaTech Community have made this a collaborative endeavour, by launching a call for input to their communities in September 2022 and setting up this group of experts who are currently working on the topic. This group is composed of cultural heritage professionals, technical experts, and researchers working in academia. The findings will be published in scientific articles and presented at conferences.
Milestones
The expert group has published the first version of their Datasheet template in September 2023 and discussed methodology and recommendations in Alkemade, H., Claeyssens, S., Colavizza, G., Freire, N., Lehmann, J., Neudecker, C., Osti, G. and van Strien, D., 2023. Datasheets for Digital Cultural Heritage Datasets, Journal of Open Humanities Data, 9 (1), p.17. DOI: 10.5334/johd.124
After presenting the Datasheet template at conferences and workshops across Europe, in 2025, the expert group has focused on its refinement to increase interoperability and better support data reuse. Organising a series of workshops was essential to collect feedback within and beyond the Europeana Initiative, while enhancing its community-driven aspects; one of the workshops - Datasheets for digital cultural heritage, 9 April 2025 - was open to everyone and highly attended. This phase led to the release of the Datasheet template - Version 2 in July 2025.
Learn more
Contextualising Collections with 'Datasheets for Digital Cultural Heritage Datasets', A Conversation with Steven Claeyssens and Beth Kanazook, Digital Repository of Ireland - Blog, 21 May 2024
Upcoming outcomes
- Overall alignment with DCAT-AP (Application profile for data portals in Europe) for machine readability
- An online tool to support the creation of datasheets
Get in touch!
Get in touch if you want to share your thoughts and experiences on the topic, or would like us to present our work to your institution, by writing to research@europeana.eu!