How do we solve it?
There are three prominent approaches to solve the localization problem.
This refers to the Iterative Closest Point (ICP) algorithm. If you’re unsure what this is, here’s an example:
In the above image think of the black collection of points (point cloud) as the original static object. In reality this black point cloud is a portion of the map. The red point cloud is the currently observed cloud that undergoes transformation to precisely overlap with that portion of the map.
The name of the method comes from the fact that it tries to find correspondence for each point from the red point cloud in the black cloud. In such a way it can determine exactly how much the red point cloud (current observation) is offset from the black point cloud (map). This offset is measured in rotation and translation (distance). In such a way the exact location is estimated in the map. In reality, however, maps look very different between when they were captured and how they look now, thus it is not always possible to associate similar objects.
Instead of finding a correspondence for each point, we can be smarter and only find a correspondence for the points that are “interesting”. These regions are called points of interest or keypoints. For this to work, we need to firstly determine what is interesting. Keypoints need to satisfy two properties – distinctiveness and repeatably. This means that 1. it is quite easy to find them in our data and 2. if we shift the point of view we would still find them. In an image those would be corners of windows, doors, etc. An example is shown below.
Once we have these areas of interest, we can compare between them. In order to do that we “describe” the area around these keypoints. That simply means that we store mathematical properties for these points. For instance, the angles between the surfaces of three walls adjacent to each other comprising a corner. These are called local descriptors that essentially shrink the interest point into a collection of numbers (vector) containing different properties (features).
We can then match these interest points by comparing the corresponding vectors of features. For example, to check whether two keypoints match the difference between their properties should be small.
Another option to perform localization is to compare the entire collection of point cloud by describing it. This means that we extract mathematical properties such as the volume of the pointcloud, or the average heights of all points and compare these across different pointclouds to find the matching one.
The third way is a combination of both local and global descriptors. Firstly, both the map and the currently observed clouds are segmented into objects to produce the equivalent of keypoints, or objects of interest in this case. Secondly, all these objects are “described” to retrieve their properties. The properties of the objects along with their position in the map are stored in a database. The objects of the currently observed scene (source cloud) are then matched against the previous map (target map) that’s in the database. An illustration of this approach is available in the following video: