The stability of a stack can be tested by considering sub-stacks sequentially, from top to bottom. For stability, the projection of the centre of mass of each sub-stack must lie within the contact surface of the object supporting it. As shown on the right, a cylindrical or spherical object offers an infinitesimally small contact surface which does not afford stability.
An overview over scenarios from our ShapeStacks dataset. We have created about 20,000 virtual object stacks with randomised colours, textures and lighting conditions and let a neural network observe them in order to acquire an intuition about the stability of stacks of rigid bodies.
In follow-up experiments we inquired whether the network based its predictions on a plausible physical intuition. We started off by visualising the importance our network assigns to certain regions of an input image when making a prediction. We find empirically, that our network especially looks for non-planar support surfaces and overhanging objects when predicting that a stack is unstable. This is in line with human intuition and roughly corresponds to the visual implications of the centre of mass principle, although an exact CoM is never computed by the network.
In the heatmap on the right we visualise the image regions our network pays most attention to when predicting that a stack is unstable. We find that in about 80% of all cases, our network correctly attends to the region of stack where the collapse starts.
Next, we investigated whether the network can compute a good stacking pose for an object based off the stability prediction. Therefore we constructed the following proxy task: We rotated the object under investigation underneath a larger box and predicted the stability of the observed scene. We used the anticipated stability as an indicator of an object’s ‘stackability’, i.e. how much support it can provide in a certain orientation.
We visualise the provided support of the lower object in different poses. A red colour indicates that the current pose is unsuitable for building a stable stack, the green colour indicates that the pose provides stable support for the object on top.
Lastly, we gauged how useful the stability predictions are in an artificial manipulation task. We created new scenarios with randomised objects and tried to assemble them into a stable stack using only the trained stability predictor. We started off by computing suitable stacking poses and ranked all objects according to their stackability scores with the support proxy experiment. More stackable objects should be placed first, lesser stackable ones last. Then we sampled positions for the next object to be stacked and evaluated the stability of the resulting tower with our prediction network.
The given objects are first oriented and ranked according to their stackability. Then they are stacked from most stackable to least stackable. Stacking positions are sampled in a simulated annealing fashion: For each sampled position the stability of the resulting tower is predicted and the object is continuously moved to the next most stable position in its vicinity until the process converges to the final position.
In the simulated stacking experiments, our model builds towers with a median height of eight pieces. This exceeds the maximum height observed during training (the training dataset only features stacks with up to six objects) and serves as a proof-of-concept that the learned intuition about structural stability can be employed successfully in a manipulation task.
] Peter W. Battaglia, Jessica B. Hamrick, and Joshua B. Tenenbaum: Simulation as an engine of physical scene understanding. PNAS November 5, 2013
] Hamrick JB, Battaglia PW, Griffiths TL, Tenenbaum JB.: Inferring mass in complex scenes by mental simulation. Cognition. December, 2016
] James R. Kubricht, Keith J. Holyoak, Hongjing Lu: Intuitive Physics: Current Research and Controversies. Trends in Cognitive Sciences. October 1, 2017
] Jiajun Wu, Illker Yildirim, Joseph J. Lim, William T. Freeman, Joshua B. Tenenbaum: Galileo: Perceiving Physical Object Properties by Integrating a Physics Engine with Deep Learning. Advances in Neural Information Processing Systems 28 (NIPS 2015)
] Adam Lerer, Sam Gross, Rob Fergus: Learning Physical Intuition of Block Towers by Example. ICML’16 Proceedings of the 33rd International Conference on International Conference on Machine Learning
] Jiajun Wu, Erika Lu, Pushmeet Kohli, Bill Freeman, Josh Tenenbaum: Learning to See Physics via Visual De-animation. Advances in Neural Information Processing Systems 30 (NIPS 2017)
] Emanuel Todorov, Tom Erez and Yuval Tassa: MuJoCo: A physics engine for model-based control. 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems
] Christian Szegedy, Sergey Ioffe, Vincent Vanhoucke, Alex Ameni: Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning. AAAI 2017