Abstract—The world we live in is labeled extensively for the benefit of humans. Yet, to date, robots have made little use of human readable text as a resource. In this paper we aim to draw attention to text as a readily available source of semantic information in robotics by implementing a system which allows robots to read visible text in natural scene images and to use this knowledge to interpret the content of a given scene. The reliable detection and parsing of text in natural scene images is an active area of research and remains a non-trivial problem. We extend a commonly adopted approach based on boosting for the detection and optical character recognition (OCR) for the parsing of text by a probabilistic error correction scheme incorporating a sensor-model for our pipeline. In order to interpret the scene content we introduce a generative model which explains spotted text in terms of arbitrary search terms. This allows the robot to estimate the relevance of a given scene with respect to arbitrary queries such as, for example, whether it is looking at a bank or a restaurant. We present results from images recorded by a robot in a busy cityscape.

  • [PDF] I. Posner, P. Corke, and P. Newman, “Using Text-Spotting to Query the World,” in Proc. of the IEEE/RSJ Int. Conf. on Intelligent Robots and Systems (IROS), 2010.
    author = {Ingmar Posner and Peter Corke and Paul Newman},
    title = {Using Text-Spotting to Query the World},
    booktitle = {Proc. of the {IEEE/RSJ} Int. Conf. on Intelligent Robots and Systems (IROS)},
    year = {2010},
    month = {October},
    note = {10},
    pdf = {http://www.robots.ox.ac.uk/~mobile/Papers/PosnerCorkeNewman_IROS2010.pdf},
    keywords = {Literate Robots, conference_posner},