Sketch Guided Object Localization in Natural Images

Aditay Tripathi1                   Rajath R Dani1                   Anand Mishra2                    Anirban Chakraborty1

1Indian Institute of Science, Bangalore,India       2Indian Insitute of Technology, Jodhpur,India

ECCV 2020 (Spotlight)

[Paper] [Code] [Supp. Mat.][Slides]


This work introduces the novel problem of localizing all the instances of an object (seen or unseen during training) in a natural image via sketch query. We refer to this problem as sketch-guided object localization. This problem is distinctively different from traditional sketch based image retrieval where the gallery set often contains images with only one object. The sketch-guided object localization proves to be more challenging when we consider the following: (i) the sketches which are used as queries are crude line drawings  with  little  information  of  shape and salient attributes of the object.(ii) Moreover, the sketches have significant variability as they are drawn by diverse set of untrained human subjects and (iii) there exists a domain gap between sketch queries and  target  images  as  these  come  from  very  different  data  distributions. To address the problem of sketch-guided object localization, we propose a novel cross-modal attention scheme which guides a region proposal network to generate object proposals relevant to the sketch query, which are later scored against the query to generate final localizations. Our method is effective with as little as one sketch query. Moreover, it also generalizes well to object categories unseen during training and is effective in localizing multiple object instances present in the image. Furthermore, we extend our framework for a multi-query setting using novel feature and attention fusion strategies introduced in this paper. The localization performance is evaluated on ubiquitous public object detection benchmarks, namely MS-COCO and PASCAL VOC, with sketch queries from Quick, Draw!. The proposed method significantly outperforms related baselines on single-query as well as multi-query localization tasks.

Given a query sketch and an input image, our end-to-end trainable sketch-guided localization framework works in the following two stages: (i) query-guided proposal generation: in this stage cross-modal attention is learnt and query relevant proposals are generated. Feature vectors corresponding to different regions in the image feature map are scored with the global sketch representation to identify the compatibility. Then these compatibility scores (attention matrix) are multiplied with image feature maps to get attention feature (Block (1) and (2)). Further, these attention feature maps are concatenated with the original feature and projected to low-dimensional space which is then passed through region proposal network (RPN) to generate relevant object proposals (Block (3)), (ii) proposal scoring: the object proposals are scored with sketch query to generate localization for the object of interest (Block 4).


  • Novel Task: Sketch-Guided Object Localization
  • Query-Guided Region Proposal Network
  • Cross-Modal Attention
  • Sketch-Guided Localization results on Seen and Unseen classes



author    = "Tripathi, Aditay and R. Dani, Rajath  and Mishra, Anand and Chakraborty, Anirban",

title     = "Sketch-Guided Object Localization in Natural Images",

booktitle = "ECCV",

year      = "2020",



The authors would like to thank the Advanced Data Management Research Group, Corporate Technologies, Siemens Technology and Services Pvt. Ltd., and Pratiksha Trust, Bengaluru, India for partly supporting this research.


Aditay Tripathi                       Rajath R Dani                          Anand Mishra                       Anirban Chakraborty