New media search methods in the Europeana API
The REST API allows you not only to search on and retrieve metadata, but gives you also powerful features based on technical metadata. Technical metadata is metadata which is extracted from media files which reside in records, such as the width and height of an image. These features give you the possibility to search for and filter on Europeana records by media information, for instance to only search for records which have extra large images, high-quality audio files, or which images match a particular colour. These features were developed as part of the Content Re-use Framework within the Europeana Creative project.
The media search features as described on this page are part of the existing search API, search facets and the record API response.
Background information
Europeana extracts technical metadata from all media URL's within all the Europeana records (present within the edm:isShownBy and edm:hasView fields) in specific time intervals to verify whether all links still resolve and to extract technical metadata from these media files. This information is then made available for search and included in the record API. This information is updated on a continuous basis.
Cardinality
A Europeana metadata record can contain a reference to zero, one or more media files. When a search is made on a technical metadata property or facet (such as image size), a record is returned if one of the media files present in the record match the search query.
Search
The search API allows searching on the following media parameters:
Parameter | Datatype | Description |
---|---|---|
media | Boolean | Filter by records where an URL to the full media file is present in the edm:isShownBy or edm:hasView metadata and is resolvable. |
colourpalette | String | Filter by images where one of the colours of an image matches the provided colour code. You can provide this parameter multiple times, the search will then do an 'AND' search on all the provided colours. See colour palette. |
Facets
The Search API returns a list of media-related facets to tell more about the distribution of media information on the search results. The facets also can be included in search queries to allow for very specific media searches such as querying on image size or audio duration.
The following facets are available in the facets profile in search and can be searched on as well:
Facet name | Datatype | Media type | Description |
---|---|---|---|
MEDIA | boolean | To indicate whether an URL to the full media file is present in the edm:isShownBy or edm:hasView metadata and is resolvable. | |
MIME_TYPE | string | Mime-type of the file, e.g. image/jpeg | |
IMAGE_SIZE | string | Image | Size in megapixels of an image, values: small (< 0.5MP), medium (0.5-1MP), large (1-4MP) and extra_large (> 4MP) |
IMAGE_COLOUR | boolean | Image | Lists 'true' for colour images. An alias to this facet is IMAGE_COLOR, note that for non-colour images you cannot provide the 'false' value. Use the greyscale-facet instead. |
IMAGE_GREYSCALE | boolean | Image | Lists 'true' for greyscale images. An alias to this facet is IMAGE_GRAYSCALE, note that for colour images you cannot provide the 'false' value. Use the colour-facet instead. |
COLOURPALETTE | string | Image | The most dominant colours present in images, expressed in HEX-colour codes. See colour palette. |
IMAGE_ASPECTRATIO | string | Image | Portrait or landscape. |
VIDEO_HD | boolean | Video | Lists 'true' for videos that have a resolution higher than 576p. |
VIDEO_DURATION | string | Video | Duration of the video, values: short (< 4 minutes), medium (4-20 minutes) and long (> 20 minutes). |
SOUND_HQ | boolean | Sound | Lists 'true' for sound files where the bit depth is 16 or higher or if the file format is a lossless file type (ALAC, FLAC, APE, SHN, WAV, WMA, AIFF & DSD). Note that 'false' does not work for this facet. |
SOUND_DURATION | string | Sound | Duration of the sound file, values: very_short (< 30 seconds), short (30 seconds - 3 minutes), medium (3-6 minutes) and long (> 6 minutes). |
TEXT_FULLTEXT | boolean | Text | Lists 'true' for text media types which are searchable, e.g. a PDF with text. |
Sample use-case: large openly licensed images of paintings
The following section will help you build a simple application based on the media search and retrieval capabilities of the REST API. For this use-case we will construct API queries to retrieve openly licensed large and extra large images of paintings, display their thumbnails on a page and then display part of their technical metadata on a separate page for the image. This section will provide guidance on how to use the API in order to fulfil this use-case.
Retrieving large and extra large images
We will start with the search query to retrieve the records. For this, we use the following:
To breakdown the search query:search.json?wskey=xxxx&query=what:painting&media=true&qf=IMAGE_SIZE:large&qf=IMAGE_SIZE:extra_large&reusability=open
- wskey=xxxx - API authentication, replace xxxx with your API key.
- query=what:painting - Search for records where the subject is a painting.
- media=true - Records where there is a link to a media file present in the metadata and where this links resolves to a working media file. Note that this parameter is not actually needed when you do a query for any of the media facets, which already imply the value of this parameter.
- qf=IMAGE_SIZE:large - Records where an image is present of a large size (1-4MP).
- qf=IMAGE_SIZE:extra_large - Records where an image is present of an extra large size (>4 MP), note that the qf parameter can be included more than once and in this case equals to an 'OR' query.
- reusability=open - Ensure that only openly licensed media is present in the search results.
Show search results as thumbnails
Now that we have the search query, we need to use its output to render thumbnails of images on a page. First, note that we did not include any sample as for pagination, you need to apply this yourselves. For this you can use the 'rows' and 'start' parameters in the search API. To render thumbnails of the images in the search results, you need the following information from the search response:
- id - The identifier of the record.
- title - The title of the record.
- edmPreview - The URL to the thumbnail image of the main media file.
With this information, you can build a page which shows the thumbnail (edmPreview), along with a title (title) and with a link to a separate page which at minimum should contain the identifier of the record (id). Next, we will help you create that separate page.
Show the large image with its technical metadata
If a user clicks on a thumbnail from the search results, next thing you want is to display a large (or extra large) images along with its technical metadata. For this, you need to retrieve the record information from the record API. An example query to the record API would be:
/record/90402/BK_1978_399.json?wskey=xxxx
As you can see, the only parameter - aside from your API key, is the record identifier. In order to then display the (extra) large images and information from the technical metadata, you need to parse the record API response as follows:
- Use the URL from the "edmIsShownBy" field in the "aggregations" class as the URL of the image file. This field only appears once.
- Iterate through the "webResources" in the same "aggregations" class until you find the WebResource element which URL ("about") corresponds with the "edmIsShownBy". In here, the technical metadata is present.
- Then, render the technical metadata you want to display, for instance the "ebucoreWidth" and "ebucoreHeight" (width x height in pixels).
Other examples
Find all records that match the query ‘Paris’ which are openly licensed and have large images:
Test on API Consolesearch.json?wskey=xxxx&query=Paris&reusability=open&qf=IMAGE_SIZE:large
Find all records that match the query Paris which have a thumbnail image, are of mime type image/jpeg and have an aspect ratio of 'landscape':
Test on API Consolesearch.json?wskey=xxxx&query=Paris&thumbnail=true&qf=MIME_TYPE:image%2Fjpeg&qf=IMAGE_ASPECTRATIO:landscape
Find all records where the subject is opera and where the results are sound files with a long duration:
Test on API Consolesearch.json?wskey=xxxx&query=what:opera&qf=SOUND_DURATION:long
Find all records where one of the images has a (dominant) red colour:
Test on API Consolesearch.json?wskey=xxxx&query=*:*&colourpalette=%23FF0000