Abstract
We propose a novel gaze-control model for detecting objects in images. The model, named act-detect, uses the information from local image samples in order to shift its gaze towards object locations. The model constitutes two main contributions. The first contribution is that the model’s setup makes it computationally highly efficient in comparison with existing window-sliding methods for object detection, while retaining an acceptable detection performance. act-detect is evaluated on a face-detection task using a publicly available image set. In terms of detection performance, act-detect slightly outperforms the window-sliding methods that have been applied to the face-detection task. In terms of computational efficiency, act-detect clearly outperforms the window-sliding methods: it requires in the order of hundreds fewer samples for detection. The second contribution of the model lies in its more extensive use of local samples than previous models: instead of merely using them for verifying object presence at the gaze location, the model uses them to determine a direction and distance to the object of interest. The simultaneous adaptation of both the model’s visual features and its gaze-control strategy leads to the discovery of features and strategies for exploiting the local context of objects. For example, the model uses the spatial relations between the bodies of the persons in the images and their faces. The resulting gaze control is a temporal process, in which the object’s context is exploited at different scales and at different image locations relative to the object.
Collapse