Автор: Björn Barz
Издательство: Cuvillier
Год: 2020
Страниц: 323
Язык: английский
Формат: pdf (true)
Размер: 22.4 MB
Content-based image retrieval (CBIR) aims for finding images in large databases such as the internet based on their content. Given an exemplary query image provided by the user, the retrieval system provides a ranked list of similar images. Most contemporary CBIR systems compare images solely by means of their visual similarity, i.e., the occurrence of similar textures and the composition of colors. However, visual similarity does not necessarily coincide with semantic similarity. For example, images of butterflies and caterpillars can be considered as similar, because the caterpillar turns into a butterfly at some point in time. Visually, however, they do not have much in common. In this work, we propose to integrate such human prior knowledge about the semantics of the world into Deep Learning techniques. Class hierarchies serve as a source for this knowledge, which are readily available for a plethora of domains and encode is-a relationships (e.g., a poodle is a dog is an animal etc.). Our hierarchy-based semantic embeddings improve the semantic consistency of CBIR results substantially compared to conventional image representations and features.
We furthermore present three different mechanisms for interactive image retrieval by incorporating user feedback to resolve the inherent semantic ambiguity present in the query image. One of the proposed methods reduces the required user feedback to a single click using clustering, while another keeps the human in the loop by actively asking for feedback regarding those images which are expected to improve the relevance model the most. The third method allows the user to select particularly interesting regions in images. These techniques yield more relevant results after a few rounds of feedback, which reduces the total amount of retrieved images the user needs to inspect to find relevant ones.
Given an example image provided by the user, CBIR systems search for similar images in large databases and are relevant for a variety of applications. Besides the generic similarity-based image search, the possible application areas include visual product search (“shop the look”), biodiversity research, medical applications with massive amounts of image data, visual classification with an open set of classes or limited amounts of training data, and more.
The concept of “similarity”, however, is by no means unambiguous, but varies depending on the objective pursued by the user. The majority of contemporary CBIR techniques employs image features extracted from deep neural networks pre-trained on proxy-tasks such as classification. The similarity between these feature vectors has been found to reflect the visual similarity between images. Humans, however, tend to see the semantics in images first. Visual similarity is secondary and rarely a reliable indicator for semantic similarity. A picture of a pear can resemble that of a bottle with regard to shape and color, but a human would not consider them as similar due to the lack of semantic concordance. On the other hand, a caterpillar and a butterfly can be considered as similar from a semantic point of view, even though their visual appearance is quite different.
The predominant technique for end-to-end learning are artificial neural networks (ANNs), which consist of a stack of several layers that apply a non-linear transformation to the output of the previous layer. When dealing with images, this transformation typically consists in a set of learned convolutions followed by a non-linear element-wise function. While the earlier layers typically encode low-level image features such as edges, the degree of abstraction increases from layer to layer. For this reason, end-to-end learning using ANNs is widely known as Deep Learning.
Скачать Semantic and Interactive Content-based Image Retrieval