1Indian Institute of Science, Bangalore,India 2Indian Insitute of Technology, Jodhpur,India
ECCV 2020 (Spotlight)
[Paper] [Code] [Supp. Mat.][Slides]
This work introduces the novel problem of localizing all the instances of an object (seen or unseen during training) in a natural image via sketch query. We refer to this problem as sketch-guided object localization. This problem is distinctively different from traditional sketch based image retrieval where the gallery set often contains images with only one object. The sketch-guided object localization proves to be more challenging when we consider the following: (i) the sketches which are used as queries are crude line drawings with little information of shape and salient attributes of the object.(ii) Moreover, the sketches have significant variability as they are drawn by diverse set of untrained human subjects and (iii) there exists a domain gap between sketch queries and target images as these come from very different data distributions. To address the problem of sketch-guided object localization, we propose a novel cross-modal attention scheme which guides a region proposal network to generate object proposals relevant to the sketch query, which are later scored against the query to generate final localizations. Our method is effective with as little as one sketch query. Moreover, it also generalizes well to object categories unseen during training and is effective in localizing multiple object instances present in the image. Furthermore, we extend our framework for a multi-query setting using novel feature and attention fusion strategies introduced in this paper. The localization performance is evaluated on ubiquitous public object detection benchmarks, namely MS-COCO and PASCAL VOC, with sketch queries from Quick, Draw!. The proposed method significantly outperforms related baselines on single-query as well as multi-query localization tasks.
Given a query sketch and an input image, our end-to-end trainable sketch-guided localization framework works in the following two stages: (i) query-guided proposal generation: in this stage cross-modal attention is learnt and query relevant proposals are generated. Feature vectors corresponding to different regions in the image feature map are scored with the global sketch representation to identify the compatibility. Then these compatibility scores (attention matrix) are multiplied with image feature maps to get attention feature (Block (1) and (2)). Further, these attention feature maps are concatenated with the original feature and projected to low-dimensional space which is then passed through region proposal network (RPN) to generate relevant object proposals (Block (3)), (ii) proposal scoring: the object proposals are scored with sketch query to generate localization for the object of interest (Block 4).
author = "Tripathi, Aditay and R. Dani, Rajath and Mishra, Anand and Chakraborty, Anirban",
title = "Sketch-Guided Object Localization in Natural Images",
booktitle = "ECCV",
year = "2020",
The authors would like to thank the Advanced Data Management Research Group, Corporate Technologies, Siemens Technology and Services Pvt. Ltd., and Pratiksha Trust, Bengaluru, India for partly supporting this research.