Using Text-Spotting to Query the World

/, Perception, Topics/Using Text-Spotting to Query the World

Abstract—The world we live in is labeled extensively for the benefit of humans. Yet, to date, robots have made little use of human readable text as a resource. In this paper we aim to draw attention to text as a readily available source of semantic information in robotics by implementing a system which allows robots to read visible text in natural scene images and to use this knowledge to interpret the content of a given scene. The reliable detection and parsing of text in natural scene images is an active area of research and remains a non-trivial problem. We extend a commonly adopted approach based on boosting for the detection and optical character recognition (OCR) for the parsing of text by a probabilistic error correction scheme incorporating a sensor-model for our pipeline. In order to interpret the scene content we introduce a generative model which explains spotted text in terms of arbitrary search terms. This allows the robot to estimate the relevance of a given scene with respect to arbitrary queries such as, for example, whether it is looking at a bank or a restaurant. We present results from images recorded by a robot in a busy cityscape.

  • [PDF] I. Posner, P. Corke, and P. Newman, “Using Text-Spotting to Query the World,” in Proc. of the IEEE/RSJ Int. Conf. on Intelligent Robots and Systems (IROS), 2010.
    Author = {Ingmar Posner and Peter Corke and Paul Newman},
    Booktitle = {Proc. of the {IEEE/RSJ} Int. Conf. on Intelligent Robots and Systems (IROS)},
    Keywords = {Literate Robots, conference_posner},
    Month = {October},
    Note = {10},
    Pdf = {},
    Title = {Using Text-Spotting to Query the World},
    Year = {2010}}