Abstract— This paper describes a probabilistic framework for appearance based navigation and mapping using spatial and visual appearance data. Like much recent work on appearance based navigation we adopt a bag-of-words approach in which positive or negative observations of visual words in a scene are used to discriminate between already visited and new places. In this paper we add an important extra dimension to the approach. We explicitly model the spatial distribution of visual words as a random graph in which nodes are visual words and edges are distributions over distances. Care is taken to ensure that the spatial model is able to capture the multi-modal distributions of inter-word spacing and account for sensor errors both in word detection and distances. Crucially, these inter-word distances are viewpoint invariant and collectively constitute strong place signatures and hence the impact of using both spatial and visual appearance is marked. We provide results illustrating a tremendous increase in precision-recall area compared to a state-of-the-art visual appearance only systems.