Addressing broken links to digital objects - our approach
The Europeana website provides access to digital representations of millions of cultural heritage objects from all over Europe, and we want to give our audiences the best possible experience when exploring them. In this post, we tell you how we are working with our aggregators to address the issue of broken links.
The Europeana website provides access to digital representations of millions of cultural heritage objects from all over Europe. These digital representations are accessible via links provided with the metadata that we publish on the Europeana website. However, links sometimes break, which can be a frustrating experience for website visitors and users of our API.
Broken links have a variety of origins: from projects ending and image collections being taken offline, to institutions migrating their collections to a new platform internally and not implementing redirections or using persistent identifiers. Broken links also vary in nature: some are temporarily broken, some permanently, and others may just appear to be broken for some users depending on access conditions set by the data provider. This makes resolving this issue a challenging task.
A new process to address broken links
Europeana Foundation product teams are working on ways to effectively identify broken links and address them. In the spring of 2020, we developed a tool based on the Metis Media Service that checks links in a small sample of records in every dataset published on the Europeana website once a week. The tool produces a report for every link it finds that has a problem, and specifies the problem with the link - for example, that it is not accessible at all (error code 404), that an object behind a link is only accessible after redirection or that a link does not look safe for a browser to open. Every three months we will produce a consolidated report, to get a full overview of the datasets where problems with links are consistently reported in this three month period.
This report will be the basis for a manual check to confirm that links are really broken and the errors reported by the tool are correct. In this step we will exclude datasets where links have other issues but are not broken. Time out, SSL issues and temporary technical issues will not count as broken links and will be removed from the report and addressed separately. After this clean up step, the report should only include datasets where links are really broken and access to objects is not possible for people or machines.
The first report of this kind has already been shared with our aggregators, and a second one will be generated at the end of March 2021. All broken link datasets present in both this first and second report will be depublished from the Europeana website in early April. This means that the datasets will not be accessible via the Europeana website, but will still be accessible via the preview environment. So if the broken links are later fixed, it would be possible to bring the datasets back to the website. This cycle will be continued throughout 2021 to ensure that our links stay up to date.
Working with our data partners
We have asked all of our aggregators to set aside time to check these reports and fix any issues that they can by the end of March. We are also planning to reach out to data partners that are inactive, or we no longer have contact with, to engage them in this body of work and ensure the best experience of cultural heritage objects that we can provide.
If you are a cultural heritage institution which provides data to Europeana and are concerned about broken links in your data, please reach out to your aggregator or to me (henning.scholz@europeana.eu) to discuss further!