Close Encounters with AI: a deep dive in image content analysis

Marco Rendina: To kick start the conversation, can you tell us exactly what image content analysis is?

Henk Vanstappen: Image content analysis, also known as visual analysis, is the process of extracting information from digital images. It employs sophisticated techniques and algorithms to analyse various aspects of an image, such as objects, patterns, colours, textures and shapes. This technology is being utilised across numerous domains, from medical diagnosis to video surveillance.

MR: How is this relevant to the cultural heritage sector?

HV: In cultural heritage, we often encounter vast collections of digital images with minimal metadata about their actual content. Imagine an extensive photo archive where only the date and photographer are recorded. For the average user, navigating and searching through such a collection without textual information would be an arduous task. Image analysis can automate the detection of objects, classify images into meaningful groups (for example, images containing people) and more, making these collections more accessible. You can find some good examples of what is achievable in another series of news posts on Europeana Pro.

MR: I understand an object detection tool has been developed for the AI4Culture project - what can you tell us about it?

HV: It is an object and subject detection tool. Object detection identifies physical objects within an image, such as a railway station or a dress. Subject detection determines the broader subject matter, like 'architecture,' 'traffic,' or 'fashion.' This tool is available in different 'flavours' to cater to various use cases.

MR: I like this idea of a digital tool having ‘flavours’ - it makes it sound very approachable. What are these multiple 'flavours'?

HV: We wanted to provide the most suitable tool for different scenarios. The basic ‘flavour’ packages a high-speed, simple object detection tool that uses the MobileNet-SSD v3 model. It is capable of recognising common objects like cars, planes, or people – you could, for example, use it to screen image collections to detect privacy-sensitive content.

The second tool packaged in the service employs a sophisticated generative AI model (Salesforce/blip-vqa-base) that can comprehend and answer questions about an image's content, similar to how ChatGPT operates with text. While more advanced than the basic version, it cannot pinpoint an object's location within the image.

The third option in the package leverages Google's Vision service, offering even greater detection capabilities. However, as a commercial service, it requires a user account on Google Cloud, a cloud service offering object detection, making it more suited for advanced use.

MR: There's also a colour detection tool available. What makes colour analysis significant?

HV: Colour is a crucial aspect of certain collections, such as those related to design and fashion. However, defining colours is a highly subjective process. While the human eye can discern a piece of jewellery as gold or copper, a computer may simply perceive it as yellow. Also, to a computer, the colours of an image of a sheep in a meadow are just 'white' and 'green'. So we made algorithms that can isolate objects from the background and accurately identify their colours.

MR: Does this tool incorporate object detection as well?

HV: Yes. While the tool can automatically isolate objects, users can also assist by specifying the region where an object is located. This way, you can leverage the output from the object detection tool to obtain the colours of multiple objects within a single image, if present.

MR: And does the object detection tool come in different flavours too?

HV: Indeed. The first version counts the pixels of the detected object, groups them into colours and returns the proportion of each colour as a percentage. The second version uses the same generative AI model as the object detection tool, providing a more human-like interpretation of colours. However, it does not offer precise colour proportions, instead returning a limited set of three or four dominant colours per object.

MR: That's quite comprehensive. Do these tools generate outputs only in English?

HV: Not at all. The tools also provide links to Wikidata, an extensive knowledge base that powers Wikipedia (see, for example, the identifier for the concept 'dress'). This allows users to access colour and object names in virtually any language supported by Wikidata, enhancing the tools' accessibility across diverse linguistic communities.

MR: With such advanced technology, are there ethical concerns regarding the future? Could image analysis eventually replace human experts?

HV: While the technology continues to evolve and become more sophisticated, it's unlikely to entirely replace human expertise anytime soon. Algorithms, though powerful, are not infallible, just as human analysis can sometimes be subjective. However, these AI-driven tools offer significant advantages: they are remarkably fast, consistent and unwavering in their focus on repetitive tasks. Ultimately, they serve as valuable complements to human experts, enabling them to dedicate their time to more nuanced, creative endeavours while leveraging AI for large-scale data processing.

MR: How difficult is it for users to work with these tools?

HV: For those interested in exploring the tools' capabilities, we've developed a basic graphical interface for the colour detection and the object detection tool, where users can input the URL of an online image and test the various flavours and settings. This web-based tool requires no installation on the user's computer, though the option to download and run it locally is also available. However, to integrate these tools into existing databases and process large quantities of images, some programming expertise will be necessary. For such advanced use cases, we've provided comprehensive documentation on our GitHub page to guide developers through the integration process seamlessly.

Find out more

In September 2024, the AI4Culture project will launch a platform where open tools, like the detection tools presented above, will be made available online, together with related documentation and training materials. Keep an eye on the project page on Europeana Pro for more details and stay tuned on the project LinkedIn and X account!

The object and subject detection tool is also integrated into the MINT aggregation platform and offered as a ready-to-use value-added service to its users. The graphical user interface enables MINT users to enrich their metadata with the annotations extracted by the image analysis tool with just a few clicks. If you are interested in taking advantage of this newly added MINT feature, you can follow this video tutorial.