New approaches for the integration of the discrete cosine transform in neural networks for fine-grained image classificationTools Tan, Kelvin Sim Zhen (2025) New approaches for the integration of the discrete cosine transform in neural networks for fine-grained image classification. PhD thesis, University of Nottingham.
AbstractA convolutional neural network (CNN) is a popular neural network architecture that excels in its ability to capture patterns in tasks with grid-structured inputs (e.g. visual recognition). Fine-grained visual classification (FGVC) uses CNN to categorise images of high intra-class and low inter-class variance. According to the literature, the 2D Discrete Cosine Transform (DCT) is one of the well-known transformations used in compression for its robustness and high data compaction properties. In compressed domain image classification, many works have focused on extracting features from the low DCT coefficients (L-DCTCs) through a fully pointwise vanilla CNN. Here, the abundant medium to high DCTCs have typically been discarded. Although pointwise convolution is capable of complex transformations, the spatial context and representation are limited. The area of compressed domain FGVC remains a relatively inactive field. It is therefore essential to explore compressed domain FGVC under DCT conditions to investigate the relationship between fine-grained features and the full spectrum of DCTCs. More specifically, this thesis intends to adopt and extend DCT techniques in compressed domain FGVC to address three topics: (1) the usability and inclusive learning of mid-band DCTCs; (2) the adaptive learning of DCT basis functions on composing the pointwise convolutional kernels; (3) the interaction between DCT channel groups in feature representations. The first contribution introduces the ‘Skipped Medium DCT CNN’. The M-DCTCs were processed via a skipping branch with a shallow convolutional block alongside the L-DCTCs which were passed through the main branch of the CNN. This architecture achieved a classification error drop of up to 7% over the standard model without the skipping branch. It highlights the importance of combining higher-frequency DCTCs with lower ones for improved robustness. The second contribution enhances the prior network by adaptively weighting the DCT basis functions to form a pointwise convolutional kernel. The spatial details were considered when constructing the pointwise convolutional kernel apart from the frequency contents. The adaptive weights are referred to as the ‘Adaptive DCT (Adapt-DCT)’ kernel. This network achieved up to 8% classification error drop on small-scale FGVC datasets and a top-5 testing accuracy of 73.93% on mini-ImageNet. The third contribution investigates the significance of DCT feature groups in the compressed domain FGVC. The modified attention mechanism that prioritises the channel interactions within the DCT group is referred to as the ‘Hybrid Modified Efficient Channel Attention’ (HyMod-ECA). It reduces the classification error by up to 3.5% over the original ECA. The optimised Adapt-DCT CNN with HyMod-ECA achieves a substantial parameter reduction of up to 73%. It is shown that the interactions among the DCT feature groups are one of the promising mechanisms to ease compressed domain FGVC. To conclude, this thesis discusses novel contributions in the context of combining the higher frequency DCTCs via a DCT-oriented convolutional kernel with an attention mechanism to address compressed domain FGVC.
Actions (Archive Staff Only)
|