VISIR: Visual and Semantic Image Label Refinement

author:Sreyasi Nag Chowdhury,Niket Tandon,Hakan Ferhatosmanoglu,Gerhard Weikum

<<<<<<< HEAD

9b9771a7595e1326e6b691ca715e4d32a0fe5018 abstract:The social media explosion has populated the Internet with a wealth of images. There are two existing paradigms for image retrieval: 1) content-based image retrieval (CBIR), which has traditionally used visual features for similarity search (e.g., SIFT features), and 2) tag-based image retrieval (TBIR), which has relied on user tag�ging (e.g., Flickr tags). CBIR now gains semantic expressiveness by advances in deep-learning-based detection of visual labels. TBIR benefits from query-and-click logs to automatically infer more in�formative labels. However, learning-based tagging still yields noisy labels and is restricted to concrete objects, missing out on general�izations and abstractions. Click-based tagging is limited to terms that appear in the textual context of an image or in queries that lead to a click. This paper addresses the above limitations by semanti�cally refining and expanding the labels suggested by learning-based object detection. We consider the semantic coherence between the labels for different objects, leverage lexical and commonsense knowledge, and cast the label assignment into a constrained opti�mization problem solved by an integer linear program. Experiments show that our method, called VISIR, improves the quality of the <<<<<<< HEAD ======= ======= abstract:The social media explosion has populated the Internet with a wealth of images. There are two existing paradigms for image retrieval: 1) content-based image retrieval (CBIR), which has traditionally used visual features for similarity search (e.g., SIFT features), and 2) tag-based image retrieval (TBIR), which has relied on user tag�ging (e.g., Flickr tags). CBIR now gains semantic expressiveness by advances in deep-learning-based detection of visual labels. TBIR benefits from query-and-click logs to automatically infer more in�formative labels. However, learning-based tagging still yields noisy labels and is restricted to concrete objects, missing out on general�izations and abstractions. Click-based tagging is limited to terms that appear in the textual context of an image or in queries that lead to a click. This paper addresses the above limitations by semanti�cally refining and expanding the labels suggested by learning-based object detection. We consider the semantic coherence between the labels for different objects, leverage lexical and commonsense knowledge, and cast the label assignment into a constrained opti�mization problem solved by an integer linear program. Experiments show that our method, called VISIR, improves the quality of the f7f865ca8de4bb7911d0c29a767aa113c7dc7243 9b9771a7595e1326e6b691ca715e4d32a0fe5018 state-of-the-art visual labeling tools like LSDA and YOLO.

keywords:

interpretation:

pdf:paper

code:

dataset:

ppt/video:

curator:Ranran Chu

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

VISIR: Visual and Semantic Image Label Refinement

<<<<<<< HEAD

Files

README.md

Latest commit

History

README.md

File metadata and controls

VISIR: Visual and Semantic Image Label Refinement

<<<<<<< HEAD