Wang, Xiaomeng
(2018)
Part-based tracking with cascaded regression of neighbours.
PhD thesis, University of Nottingham.
Abstract
Visual tracking aims to detect the location of a possibly moving target by extracting local appearance features and matching them between consecutive images to obtain accurate estimates of target location. Tracking of generic objects is one of the most active topics in computer vision. Despite the large body of work addressing this problem, robust visual tracking of generic objects is still a challenging problem, as the performance of a visual tracking algorithm is affected by many factors, such as non-rigid object deformation, partial or full occlusion of the target, illumination variation, scale variation, etc. Especially, many objects in the real world have a complex appearance and articulated structure. The combination of rigid motion and non-rigid object deformation results in complex appearance changes, making general object tracking a particularly hard problem.
Recently, part-based trackers are preferred in tracking with occlusion and non-rigid deformation because part-based models, which represent the target as a connected set of components, each describing a section of the object, can provide more flexible and robust object appearance models. However, there are four main problems with current part-based trackers: 1) current part-based trackers rely on a response map estimating the likelihood that any given location in an image represents the target (part); 2) the spatial information utilised by current part-based models is limited and inflexible; 3) there is no way of jointly learning shape and appearance for current part-based trackers; 4) a more complex motion model is required, with parts' motion having separate factors.
To address these four problems, this thesis proposes a novel approach to part-based tracking by replacing local matching of an appearance model by direct prediction of the displacement between local image patches and part locations. This thesis proposes to use cascaded regression (SDM) with incremental learning on deeply learned features to track generic objects without any prior knowledge of an object's structure or appearance. This thesis exploits the spatial constraints between individual parts and those between parts and the object as a whole by implicitly learning the shape and deformation parameters of the object in an online fashion. A multiple temporal scale motion model is integrated to initialise the cascaded regression search close to the target and to allow it to cope with occlusions. Experimental results clearly demonstrate the value of the method's components, and comparison with the state-of-the-art techniques in the CVPR 2013 Visual Tracker Benchmark shows that the proposed TRIC-track tracker ranks first on the full dataset.
To address the problems of low efficiency and limited samples in SDM in TRIC-track, this thesis introduces Continuous Regression to model-free visual tracking. It is found that the Taylor expansion is not able to accurately approximate image features of sample space with a high variance in visual tracking. This problem is alleviated by introducing Locally Continuous Regression strategy, proposed in this thesis. It unifies sampling based regression with Continuous Regression in an efficient manner by running Continuous Regression on a few sample locations spread around the target, and relating those sampled locations to each other. Locally Continuous Regression is integrated into the main framework of TRIC-track and shows six times computational cost improvement without sacrificing the performance, compared to its sampling-based counterpart.
Actions (Archive Staff Only)
|
Edit View |