On Machine Learning and Prior Structure for Mobile Robots – The Summary

//On Machine Learning and Prior Structure for Mobile Robots – The Summary

Similar to the long-term discussion on how much innate structure is required for artificial general intelligence [1], an important challenge in the short-term lies in the combination of traditional programming and machine learning for more narrow applications e.g. towards more efficient, robust, and safe robots. The question about the limitations or benefits of increased prior structure becomes arguably easier to answer in this context of near-term planning by acknowledging our limited capability for long-term prediction and focus on present benchmarks and current developments.

Autonomous systems are generally modularised for the same reasons as any large software systems: reuseability, ease of testing, separation of responsibilities, interpretability, etc. Existing solution modules for many tasks in mobile robotics, such as localisation, mapping, or planning, build on knowledge about the structure of tasks and environments. This may include geometry or kinematic and dynamic models, which therefore have been built into the routines of traditional programs. However, recent successes and the flexibility of fairly unconstrained, learned models shift the focus of new academic and industrial projects. Successes in image recognition (ImageNet [2]) as well as triumphs in reinforcement learning (Atari [3], Chess, Go [4]) inspire like-minded research.

Machine learning, in particular of the deep kind, has made its mark regarding applications in the perception pipeline of autonomous systems including pedestrian / car / cyclist / traffic sign detection, semantic segmentation, and other tasks [5]. While the perception systems heavily rely on learning; localisation, mapping, resoning and planning modules often continue to be the domain of carefully crafted rules and programs exploiting geometric priors and intuitions. The design of which requires expert knowledge and repeated iteration between testing – in simulation as well as on the real platform – and refinement of hundreds if not thousands of heuristics.

Machine learning has the capability to automate parts of the process by extracting rules from massive amounts of data and the additional benefit of high flexibility: merging of arbitrary objectives as well as Lego-like capabilities for the reuse and combination of models. On the other hand, we have accurate mathematical formulations to solve specific sub-problems e.g. for localisation and planning. Combinations of both approaches can provide significant advantages via redundancy and often complementary properties.

Successful directions aim at optimising input data or correcting the output of traditional programs. Examples for input improvement include learning image enhancement networks for traditional visual odometry methods [6]; output refinement includes pose correction updates for visual localisation [7] and refining dense reconstructions [8] as well as hand-crafted cost maps for motion planning [9]. Similarly, purely learning-based approaches benefit from incorporating prior knowledge and assumptions about the underlying structure: implicit and explicit translation invariance [10, 11], objectness [12], temporal structure [13], planning procedures [14], and inductive biases for SLAM-like computations [15]. Notably, the incorporation of structure can, under specific circumstances, even help when the incorporated models are inaccurate [16].

When moving from manually defining features or computations to designing the most efficient structure to learn into, one question arises naturally: why not learn everything (re.g. the architecture [16, 17], or the optimisation process [18, 19]). However, required investments in data hygiene and annotation for many applications with potential for real world impact often render it more efficient, in terms of human effort, to port our prior knowledge into algorithmic structure.

‘What structure can we build in that does not obstruct learning?’; a well-made point by Leslie Kaelbling during a panel session at CoRL2017. If structure is a necessary good or necessary evil might be up to discussion [1, 20, 21, 22], but for now, practically, it is necessary for accuracy and robustness, as are learning components.


For the complete post including more detailed references and entertaining videos please visit my personal site.



[1] “Debate: ‘Does AI Need More Innate Machinery?” (Yann LeCun, Gary Marcus).” https://www.youtube.com/watch?v=vdWPQ6iAkT4 .
[2] O. Russakovsky et al., “ImageNet Large Scale Visual Recognition Challenge,” International Journal of Computer Vision (IJCV), vol. 115, no. 3, pp. 211–252, 2015.
[3] Mnih, Volodymyr, et al. “Human-level control through deep reinforcement learning.” Nature 518.7540 (2015): 529.
[4] Silver, David, et al. “Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm.” arXiv preprint arXiv:1712.01815 (2017).
[5] “KITTI benchmark.” http://www.cvlibs.net/datasets/kitti/ .
[6] R. Gomez-Ojeda, Z. Zhang, J. Gonzalez-Jimenez, and D. Scaramuzza, “Learning-based Image Enhancement for Visual Odometry in Challenging HDR Environments,” ArXiv e-prints, 2017.
[7] V. Peretroukhin and J. Kelly, “DPC-Net: Deep Pose Correction for Visual Localization,” ArXiv e-prints, 2017.
[8] M. Tanner, S. Saftescu, A. Bewley, and P. Newman, “Meshed Up: Learnt Error Correction in 3D Reconstructions,” ArXiv e-prints, 2018.
[9] M. Wulfmeier, D. Rao, D. Z. Wang, P. Ondruska, and I. Posner, “Large-scale cost function learning for path planning using deep inverse reinforcement learning,” The International Journal of Robotics Research, vol. 36, no. 10, pp. 1073–1087, 2017.
[10] Y. LeCun and others, “Generalization and network design strategies,” Connectionism in perspective, pp. 143–155, 1989.
[11] I. H. Witten, E. Frank, M. A. Hall, and C. J. Pal, Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann, 2016.
[12] A. Byravan and D. Fox, “SE3-Nets: Learning Rigid Body Motion using Deep Neural Networks,” ArXiv e-prints, 2016.
[13] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural computation, vol. 9, no. 8, pp. 1735–1780, 1997.
[14] A. Tamar, Y. Wu, G. Thomas, S. Levine, and P. Abbeel, “Value Iteration Networks,” ArXiv e-prints, 2016.
[15] J. Zhang, L. Tai, J. Boedecker, W. Burgard, and M. Liu, “Neural SLAM,” ArXiv e-prints, 2017.
[16] T. Weber et al., “Imagination-Augmented Agents for Deep Reinforcement Learning,” ArXiv e-prints, 2017.
[16] E. Real, A. Aggarwal, Y. Huang, and Q. V. Le, “Regularized Evolution for Image Classifier Architecture Search,” ArXiv e-prints, 2018.
[17] A. Brock, T. Lim, J. M. Ritchie, and N. Weston, “SMASH: One-Shot Model Architecture Search through HyperNetworks,” ArXiv e-prints, 2017.
[18] J. X. Wang et al., “Learning to reinforcement learn,” ArXiv e-prints, 2016.
[19] M. Andrychowicz et al., “Learning to learn by gradient descent by gradient descent,” ArXiv e-prints, 2016.
[20] D. George et al., “A generative vision model that trains with high data efficiency and breaks text-based CAPTCHAs,” Science, 2017.
[21] S. Sabour, N. Frosst, and G. E Hinton, “Dynamic Routing Between Capsules,” ArXiv e-prints, 2017.
[22] “Deep Learning, Structure and Innate Priors – A Discussion between Yann LeCun and Christopher Manning.” http://www.abigailsee.com/2018/02/21/deep-learning-structure-and-innate-priors.html .