Dynamic Multimodal Instance Segmentation Guided by Natural Language Queries

  • Edgar Margffoy-Tuay*
  • , Juan C. Pérez
  • , Emilio Botero
  • , Pablo Arbeláez
  • *Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

39 Citations (Scopus)

Abstract

We address the problem of segmenting an object given a natural language expression that describes it. Current techniques tackle this task by either (i) directly or recursively merging linguistic and visual information in the channel dimension and then performing convolutions; or by (ii) mapping the expression to a space in which it can be thought of as a filter, whose response is directly related to the presence of the object at a given spatial coordinate in the image, so that a convolution can be applied to look for the object. We propose a novel method that integrates these two insights in order to fully exploit the recursive nature of language. Additionally, during the upsampling process, we take advantage of the intermediate information generated when downsampling the image, so that detailed segmentations can be obtained. We compare our method against the state-of-the-art approaches in four standard datasets, in which it surpasses all previous methods in six of eight of the splits for this task.

Original languageEnglish
Title of host publicationComputer Vision – ECCV 2018 - 15th European Conference, 2018, Proceedings
EditorsVittorio Ferrari, Cristian Sminchisescu, Yair Weiss, Martial Hebert
PublisherSpringer Verlag
Pages656-672
Number of pages17
ISBN (Print)9783030012519
DOIs
Publication statusPublished - 2018
Externally publishedYes
Event15th European Conference on Computer Vision, ECCV 2018 - Munich, Germany
Duration: 8 Sept 201814 Sept 2018

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume11215 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference15th European Conference on Computer Vision, ECCV 2018
Country/TerritoryGermany
CityMunich
Period8/09/1814/09/18

Keywords

  • Dynamic convolutional filters
  • Instance segmentation
  • Multimodal interaction
  • Natural language processing
  • Referring expressions

Fingerprint

Dive into the research topics of 'Dynamic Multimodal Instance Segmentation Guided by Natural Language Queries'. Together they form a unique fingerprint.

Cite this