Ngo, Anh Cat Le
(2015)
Digital system for bio-inspired visual attention processing fast and efficient information theoretic modelling of saliency.
PhD thesis, University of Nottingham.
Abstract
Visual attention is a biological mechanism of human vision systems to cope with rich and fast-changing visual information in surrounding environments. Visual saliency is a strategy, which recommends attentive spots to be visited in descending orders of interest or information amounts. This thesis aims to utilize information theory in computational saliency models, assumed that more attention is drawn toward more informative locations.
As visual media, i.e. images and videos, are high-dimensional data, information estimation is often computationally infeasible due to enormous requirement of computation and data samples. This thesis proposes and analyses three different practical and innovative information-based saliency models.
The first model, called entropy-based saliency method (ENT), measures salient information with centre-surrounding operation by conditional entropy (ENT-CON) or Kullback-Leibler diver-gence (ENT-KLD). However, ENT only estimates information from local features offixed-size windows, it does not utilize multi-scale and global information of visual media, which are proven to be important in biological visual attention.
To utilise multi-scale information, Wavelet-based Scale-Saliency (WSS), the second model, estimates information from power distribution of data across wavelet sub-bands basis descriptors in multiple dyadic scales. Though WSS has benefited from local features at multiple scales, it has not integrated information of global context or statistical characteristics of natural images.
Multiscale Discriminant Saliency (MDIS), the third model, adopts Wavelet Hidden Markov Tree (WHMT) to unify both multiple-scale and global information for a comprehensive saliency method. All three models, ENT, WSS and MDIS are evaluated and compared against well-known saliency methods such as PSS, AIM, DIS, etc quantitatively by standard numerical tools (Normalized Scale Saliency (NSS), Linear Correlation Coefficient (LCC), Area Under Curver (AUC)) on N.Bruce’s, Kootstra’s and Judd’s databases with human eye-tracking ground-truth as well as qualitatively by visual examination of individual cases. Performances and comprehen-siveness of three models are reflected through numerical results of an experiment on Bruce’s database. As the latter model is designed in more comprehensive and computationally complex manner than the previous, all three quantitative evaluations (LCC,NSS,AUC) generally and computational time increase in that order.
ENT WSS MDIS
LCC 0.02263 -0.01731 0.02382
NSS -0.17533 0.31782 0.48019
AUC 0.78167 0.70292 0.88335
TIME(s/frame) 0.87040 1.26889 2.32734
Table 1: ENT,WSS,MDIS’s quantitative results on N.Bruce’s database
Actions (Archive Staff Only)
|
Edit View |