Hierarchical Attentive Recurrent Tracking

Abstract – Class-agnostic object tracking is particularly difficult in cluttered environments as target specific discriminative models cannot be learned a priori. Inspired by how the human visual cortex employs spatial attention and separate “where” and “what” processing pathways to actively suppress irrelevant visual features, this work develops a hierarchical attentive recurrent model for single object tracking in videos. The first layer of attention discards the majority of background by selecting a region containing the object of interest, while the subsequent layers tune in on visual features particular to the tracked object. This framework is fully differentiable and can be trained in a purely data driven fashion by gradient methods. To improve training convergence, we augment the loss function with terms for a number of auxiliary tasks relevant for tracking. Evaluation of the proposed model is performed on two datasets of increasing difficulty: pedestrian tracking on the KTH activity recognition dataset and the KITTI object tracking dataset.



  • [PDF] A. R. Kosiorek, A. Bewley, and I. Posner, “Hierarchical Attentive Recurrent Tracking,” in Neural Information Processing Systems, 2017.
    author = {Kosiorek, Adam R and Bewley, Alex and Posner, Ingmar},
    title = {Hierarchical Attentive Recurrent Tracking},
    booktitle = {Neural Information Processing Systems},
    year = {2017},
    month = {December},
    pdf = {http://www.robots.ox.ac.uk/~mobile/Papers/2017NIPS_AdamKosiorek.pdf},
    url = {http://www.robots.ox.ac.uk/~mobile/Papers/2017NIPS_AdamKosiorek.pdf},