Given an image stream, we demonstrate an on-line algorithm that will select the semantically-important images that summarize the visual experience of a mobile robot. Our approach consists of data pre-clustering using coresets followed by a graph based incremental clustering procedure using a topic based image representation. A coreset for an image stream is a set of representative images that semantically compresses the data corpus, in the sense that every frame has a similar representative image in the coreset. We prove that our algorithm efficiently computes the smallest possible coreset under natural well-defined similarity metric and up to provably small approximation factor. The output visual summary is computed via a hierarchical tree of coresets for different parts of the image stream. This allows multi-resolution summarization (or a video summary of specified duration) in the batch setting and a memory-efficient incremental summary for the streaming case.