Visual Attention

A brief description of Visual Attention, the Saliency Map hypothesis and a nice classification of the existing visual attention model can be found here.

For a corresponding, scientific review article, you can read this review from Borji et al.

A dynamic model of visual attention has been developed and proposed to the community in 2012. Since then, some proposals have been done in this similar idea of computing dynamically some visual saliency maps along the time.

A Time-Dependent Saliency Model Combining Center and Depth Biases for 2D and 3D Viewing Conditions

Abstract
The role of the binocular disparity in the deployment of visual attention is examined in this paper. To address this point, we compared eye tracking data recorded while observers viewed natural images in 2D and 3D conditions. The influence of disparity on saliency, center and depth biases is first studied. Results show that visual exploration is affected by the introduction of the binocular disparity. In particular, participants tend to look first at closer areas in 3D condition and then direct their gaze to more widespread locations. Beside this behavioral analysis, we assess the extent to which state-of-the-art models of bottom-up visual attention predict where observers looked at in both viewing conditions. To improve their ability to predict salient regions, low-level features as well as higher-level foreground/background cues are examined. Results indicate that, consecutively to initial centering response, the foreground feature plays an active role in the early but also middle instants of attention deployments. Importantly, this influence is more pronounced in stereoscopic conditions. It supports the notion of a quasi-instantaneous bottom-up saliency modulated by higher figure/ground processing. Beyond depth information itself, the foreground cue might constitute an early process of ‘‘selection for action’’. Finally, we propose a time- dependent computational model to predict saliency on still pictures. The proposed approach combines low-level visual features, center and depth biases. Its performance outper- forms state-of-the-art models of bottom-up attention.