Metadata = better data
Here at Europeana, we use the word 'metadata' a lot. But why is it quite so important to us? Well, because it's vital to our end-users. Europeana's Technical & Operations Director, Jan Molendijk, tells us why he's pushing hard for improvements to metadata quality in 2013.
Right from the start, Europeana has always been concerned with the quality of the metadata we collect, index and publish. The metadata for an object in Europeana is the equivalent of the catalogue record of traditional library, archive and museum systems. It describes the object in a formal and structured way and provides links to the object and its presence on the website of the data provider. The metadata is the only thing we have to create a searchable index – Europeana does not itself have access to the described objects.
Here's an example of an item with good metadata - a public domain painting called 'Tobit en Anna met het bokje' by Rembrandt from the Rijksmuseum. Have a look at the full record.
The data that comes into Europeana has often been created over a long period, sometimes centuries, in a succession of systems that were not designed for remote, multi-lingual, cross-domain searching. That is why we have, from the start, imposed some quality criteria and guidelines for the metadata sent to Europeana. These have evolved over time, which is one of the reasons why some fields that are now mandatory for new datasets still contain no information. By and large, these guidelines have resulted in ‘reasonably usable’ data, but there is still room for improvement. Over the coming months we will be focusing on several aspects of metadata quality. These are detailed below.
Rights information
The metadata in Europeana is of course now available under a CC0 waiver, which means that it can be re-used for any purpose. For the digital object itself, the situation varies. Some of the objects are in the public domain and therefore free, while others are free to access, but not to re-use, for example. To help the users of Europeana to ‘do the right thing’, we have added an 'edm:rights' field (called 'europeana:rights' in the old ESE system). This field contains a URL pointing to a page that describes what can be done with the objects. These can be public domain, any of the Creative Commons licences, or one of three specific Europeana rights pages – free access, restricted access and paid access. This field was introduced in 2010 and has been mandatory ever since. That still leaves about 35% of objects in Europeana that do not have a value for this field, leaving the user in the frustrating position that they would still have to contact the data provider to know if and how they can re-use the digital object.
The other issue is that there are some collections for which we now think that an error may have been made in assigning rights values to the objects, for example, collections of very old material that are not available under Public Domain when we think they should be, or collections where the content (as opposed to the metadata) is listed as being available under CC0.
So, to summarise, we want rights statements on every item and we want the correct rights statements.
Thumbnails
The thumbnail images, or 'previews', for Europeana objects are shown both on the search results pages and on the object pages. To guarantee a consistent quality of thumbnails, we ask the data providers to supply us with a link to a reasonably high resolution version of the object. We then harvest that image, and create a thumbnail out of it. The image itself is not published by Europeana, just the thumbnail. We know from both user studies and usage analytics data that objects with a thumbnail are 8 times(!) more likely to be clicked on than objects that have no thumbnail. Clearly the existence of thumbnails makes for a more attractive user experience. So we will be going back to providers who have not yet supplied thumbnails and encouraging them to do so. We will also be looking into improving our image harvesting tools, enabling them to handle more of the cases in which the thumbnail generation process currently fails. And we will again explain that the thumbnail itself is not released under CC0 but is covered by the same rights statement as the digital object itself.
Link quality and persistent identifiers
On any given day, about 3% of the links in Europeana are unreachable. A server at the data provider may be temporarily down for maintenance, or it may have been moved before we have received an update from the data provider. We have started to monitor this more closely, checking a number of objects from each collection every day and contacting owners of collections that are offline for more than a few days. We encourage data providers to put temporary redirections in place when they move their servers so that the old links continue to work, while together we update the links in Europeana.
A lot of this would not be necessary if all data providers implemented 'Persistent Identifiers'. When they do, they commit to keeping their links resolvable by updating a central registry with each change they make. However, we understand that introducing Persistent Identifiers means a considerable commitment, both technically and organisationally. We encourage their use, but cannot (yet) make it mandatory.
Depth of description
Metadata is the only thing we have to make an object discoverable. But we still have quite a few objects in Europeana that have no free-format description at all (a huge 8.1 million objects), or have no title (nearly 1 million objects). As the primary means of searching in Europeana is a free text search, it is very unlikely that these objects will show up in any searches. There may be very good reasons not to supply Europeana with a description: some descriptions are considered to be under copyright and all metadata delivered to Europeana is effectively licensed under CC0. Some objects may simply not have a description or a title in the collection management system of the data provider. Together with the data providers, we will be looking into what can be done in these cases. Maybe sending an excerpt of the description would be a good first step forward?
Next steps
Over the coming months we will be addressing these issues and more, starting with the rights information. We will be contacting the providers whose collections lack rights information, or where we think the accuracy of the rights information could be enhanced. If this results in updated datasets, we hope to convince these providers to also address some of the other data quality issues. And of course any data provider that feels inspired to improve their presence in Europeana is welcome to do so. Over the coming months we will also be blogging more here on Europeana Professional, giving you more detailed information on each of these quality issues and possible ways to address them. Watch this space!