Ingesting content into Europeana Cloud

Europeana is beginning to aggregate content, via the Europeana Cloud project.
This article is by Ingeborg Versprille, Project Officer at CERL, focuses on the new processes for doing this work.

The content ingestion plan for the Europeana Cloud [1] project shows how content and its accompanying metadata will be prepared and ingested into the Europeana Cloud, using three different ingestion workflows.

A hot-air balloon is flying high above the clouds, The Wellcome Library, CC BY.

Until now the Europeana Foundation was primarily focused on the aggregation of large quantities of metadata of digital heritage of European origin. The Europeana Cloud project offers the Europeana Foundation the opportunity to engage with actual content. Content in this context means the actual digital object – whether they be full text, images or other formats. Content is generally accompanied by metadata: the textual information and hyperlinks that serve to identify, discover, interpret and/or manage content.

Partners in the Europeana Cloud project were invited to contribute actual content, and several partners, including OAPEN, University College London, the National Library of Scotland and the National Library of Technology in Prague, were keen to explore this new approach with Europeana.

In addition, the project builds upon the Europeana Newspapers project, which provided for the funding of the aggregation of metadata and content from 12 full partners and the aggregation of metadata of 11 associate partners. Associate partners in the Newspapers project such as the National Libraries of Wales, Spain and Belgium will deliver full text content for the Cloud project. Partly due to rights issues, the focus will be on public domain newspapers from the nineteenth century.

For the ingestion of the Newspaper content into Europeana Cloud, the existing workflow established for the Europeana Newspapers project can be reused. For the ingestion and automatic upload of content other than newspapers, the current metadata ingestion process of The European Library is being modified. An additional UIM plugin was created to directly download the content embedded in the metadata into Europeana Cloud. Only data with direct links to images or files in pdf format can be treated in this manner.

For data with indirect links to content, or data sets with their own data model, a targeted workflow has had to be established, which allows for direct uploading into the Europeana Cloud. For this purpose an Europeana Cloud API is being finalised. The actual ingestion of the content is scheduled to commence in the second quarter of 2015 and will continue throughout the year. The project aims to test a number of ingestion paths with a number of different institutions in Europeana’s network to be able to develop a scalable and efficient workflow for adding further content (plus metadata) to the Europeana Cloud infrastructure, after the project has ended.

[1] Marian Lefferts, Adina Ciocoiu, Markus Muhr, Anastasia Gasia, Alastair Dunning (ed.), D4.2 Content Ingestion Plan
http://pro.europeana.eu/files/Europeana_Professional/Projects/Project_list/Europeana_Cloud/Deliverables/D4.2%20Content%20Ingestion%20Plan.pdf (8 April 2015)