Concluding the Europeana image classification pilot

In our final post about our image classification pilot, we explore the results of using a model which can classify a digitised cultural heritage object in multiple ways, and share our future plans for this work.

A multi-label classification model

As explored in previous Pro news posts, at the Europeana Foundation we have been running an image classification pilot, training a single label image classification model to enrich our collections. The model we developed was able to classify images into categories from our target vocabulary, but only identified one aspect (or ‘label’) of each image. So we then began to work on training the model to classify an image with more than one label - so concepts like ‘photograph’ and ‘sculpture’ could be identified in the same image.

In order to train this multilabel image classification model, we had to gather a training dataset containing images with multiple labels in their metadata. We made use of the Europeana Search API by searching for objects indexed with more than one concept from our vocabulary, resulting in 9,000 objects in total. As with our previous single label classification effort, we didn’t review this dataset, so the quality of the labels depended on the quality of previous enrichments.

In the case of multilabel classification, the correct metadata (or ground truth) contained more than one label for each image. We trained a convolutional neural network to classify the images and then used the resulting model on objects obtained from the Search API. You can see some of the examples with their predictions, confidence scores and interpretability maps below.

Our learnings

From our experiments we have concluded that the model is able to correctly identify multiple relevant labels for the given images. The multilabel approach is more helpful than using single labels since it can apply several labels to each image with high confidence.

Despite the interesting results, the performance of the resulting model is far from perfect, and we can attribute this to several factors. The most important is the relatively low quality of the dataset gathered. We found out that many of the images retrieved don’t have correct metadata.

Additionally, most of the data used for training was provided by the Norwegian DigitalMuseum. This means that the training data does not reflect the entire data distribution at Europeana, causing the model to be biased towards the data it has been trained with. The biases of the training data will translate in a lack of generalisation for the rest of images from Europeana. In simple terms, the model will perform well on images similar to the ones contained in the training dataset, but it will fail if the images are too different.

In general, our training data is good enough for the model to learn some basic patterns. The model did well despite the challenging setting of using data with incorrect labels. However, the quality of previous enrichments is not suitable for using them as training data for building a model to enrich our collections. A solution to this is to create a higher quality training dataset, to ensure that our model is presented with the right labels.

Future work: crowdsourcing

After training and evaluating the multilabel classification model, we have concluded that assigning multiple labels to the images from our collection is more suitable than enriching them with a single label.

We are considering expanding the vocabulary by including other terms relevant to cultural heritage. More importantly, we are planning to review and expand the training dataset, with the goal of identifying and correcting possible biases and errors. We would like to ensure that our model is presented with the right labels, which is expected to perform significantly better than when trained with 'noisy' labels. We have launched a crowdsourcing campaign for building a high-quality annotated dataset with Zooniverse, and we welcome contributions from our community.

You can follow our work in this Github repository. We also invite you to experiment with this Colab notebook, where you can make your own queries to the Europeana Search API and apply the multilabel classification model. Feel free to contact us at rd@europeana.eu if you have any questions or ideas!