Learning to rank salient objects using transformers and graph reasoning

Bowen, Deng (2025) Learning to rank salient objects using transformers and graph reasoning. PhD thesis, University of Nottingham.

[thumbnail of thesis after correction]
Preview
PDF (thesis after correction) (Thesis - as examined) - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
Available under Licence Creative Commons Attribution.
Download (5MB) | Preview

Abstract

This thesis explores the domain of salient object detection, aiming to find the most visually important objects within a given image. Many of the current approaches have focused on datasets with many images containing only a single salient object located towards the center. We focus here on the more complex task of images containing multiple objects, where relative saliency between objects must also be evaluated. A novel multiple salient object detection framework is proposed, utilizing both spatial and channel-wise non-local blocks within a convolutional network. The experiments compare the approach against 14 state-of-the-art methods on five widely used SOD benchmarks and a newly curated multi-object dataset. The proposed method exceeds all previous state-of-the-art approaches in three evaluation metrics and provides a further performance boost against competing techniques on the proposed dataset.

We then build upon this work to investigate the multiple salient object detection task in greater depth, exploring the problem of instance-level relative saliency ranking. This is an emerging field, and considering the lack of appropriate datasets in this domain, we produce a large-scale instance-level relative saliency ranking dataset using real human fixations. To the best of our knowledge, this is the first and largest dataset created by real human fixations for relative saliency ranking. A novel framework is then introduced that models multi-scale ranking-aware information cues in a nested style graph, drawing features from a query-based transformer. Experimental findings demonstrate the effectiveness of this proposed method. We exceed all previous state-of-the-art approaches with a large margin under three evaluation metrics. The model and full dataset will be released into the community.

Item Type: Thesis (University of Nottingham only) (PhD)
Supervisors: Michael, Pound
Andrew, French
Keywords: saliency, salient object detection, saliency ranking, multiple salient object detection, transformers, graph neural networks
Subjects: Q Science > QA Mathematics > QA 75 Electronic computers. Computer science
Faculties/Schools: UK Campuses > Faculty of Science > School of Computer Science
Item ID: 77925
Depositing User: Deng, Bowen
Date Deposited: 23 Jul 2025 04:40
Last Modified: 24 Jul 2025 04:30
URI: https://eprints.nottingham.ac.uk/id/eprint/77925

Actions (Archive Staff Only)

Edit View Edit View