10 Sep 2018

Multimotion Visual Odometry

Just as we use our eyes to perceive and navigate the world, many autonomous vehicles and systems rely on cameras to observe their environment. As we move through the world, we can watch as the world seems to move past us. We’ve long understood how to accurately estimate this egomotion (i.e., the motion of a camera) relative to the static world from a sequence of images. This process, known as visual odometry (VO), is fundamental to robotic navigation.

One of the most complex aspects of our world is its motion. Not only do our environments change over time, but most of the things we do (and want to automate) involve moving around and interacting with other dynamic objects and agents. Traditionally, VO systems ignore the dynamic parts of a scene, focusing only on the motion of the camera, but the ability to isolate and estimate each of the motions within a scene is essential for an autonomous agent to successfully navigate its environment. This presents a challenging chicken-and-egg problem, where segmenting a scene into independent motions requires knowledge of those motions, but estimating the constituent motions in a scene requires knowledge of its segmentation.

To address this challenge, we developed a multimotion visual odometry (MVO) pipeline that applies state-of-the-art techniques to estimate trajectories for every motion in a scene. MVO extends the traditional VO pipeline with multimodel fitting algorithms and batch estimation techniques to simultaneously estimate the trajectories of all motions within a scene. Sparse, 3D visual features are decomposed into independent rigid motions and the trajectories of all of these motions, including egomotion of the camera, are estimated simultaneously.

The segmentation (first image) and trajectory estimates (second image) produced by our pipeline for a scene with four independent motions observed by a dynamic camera is shown above, as well as in the video below. The camera motion (blue) is estimated from the static portions of the scene. The blocks swing and rotate independently, and their trajectories are estimated simultaneously with the camera motion.

We evaluated the system against ground truth trajectory data collected from a Vicon motion capture system, showing it performs comparably to similarly defined VO systems used solely for egomotion estimation while also estimating the motions for other objects in the scene. The work will be presented at 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2018). For more information, check out the manuscript or find us at IROS!