Multi-source data fusion for land use classification using deep learning

Cao, Rui (2021) Multi-source data fusion for land use classification using deep learning. PhD thesis, University of Nottingham.

[img] PDF (Thesis - as examined) - Repository staff only - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
Download (8MB)


Land use classification is the process of characterising the land by the purposes of usage. It is significantly important since timely and accurate land use information is essential for land monitoring and management. Traditional classification relies heavily on labour-intensive land survey, which is time-consuming and expensive. The development of remote sensing (RS) technologies has greatly advanced the automation of land monitoring, since RS images can efficiently capture the physical attributes of the earth surface. However, it is challenging to use single RS images to recognise land use, especially for high-density cities with complex and mixed land-use patterns. There are several major challenges. Firstly, it is challenging to extract representative features from RS images that are highly related to land use semantics and invariant to complex landscapes. Secondly, RS images capture the physical information from nadir view, and thus miss out many useful ground-level details. Thirdly, RS images cannot directly capture information about human activities on the land, which is critical to determine land use. To address these issues, this thesis leverages deep learning techniques and exploits recently emerged geospatial big data to complement RS images for land use classification. In summary, the thesis has made following contributions:

Firstly, a triplet deep metric learning network has been developed to enhance remotely sensed land-use scene retrieval and classification. The developed network can embed RS images into a semantic space where images from the same class are near each other while those of different classes are far apart through training with triplet loss. The produced features are discriminative and can be exploited for content-based RS image retrieval. Feature reduction methods have been investigated to reduce the redundancy of learned semantic features. Triplet selection strategies have also been examined to ensure effective and efficient training of the network. Furthermore, the extracted features have been leveraged for image scene classification. Comprehensive experiments have been conducted on popular land-use scene benchmarks, and the results demonstrated the effectiveness of the proposed methods.

Secondly, a spatial-aware Siamese-like network has been proposed to learn distinguishable embeddings for ground-to-aerial geolocalisation, which can serve as a potential solution for the geotagging of ground-taken images without geotags. Spatial transformer layer was exploited to address the large view variation issue. A loss that combines the triplet loss with a simple and effective location identity loss has also been designed to train the proposed network to further enhance the geolocalisation performances. Extensive experiments were conducted on a publicly available dataset of cross-view image pairs, and the results demonstrated the effectiveness of the method. The proposed geolocalisation method can be used to infer the geolocation of ground-taken images, which makes it possible to further fuse them with RS images for land use classification.

Thirdly, a deep learning-based approach has been developed to integrate RS images with geotagged ground-taken images for urban land use classification. Specifically, a deep neural network was used to extract semantic features from sparsely distributed street view images, and the features were further interpolated in the spatial domain to match the aerial images, which were then fused together through a deep fusion neural network for pixel-level land use classification. The proposed methods were tested on a large public dataset, and the results showed that the street view images contain useful information about land use and fusing them with aerial images can improve classification accuracy. Moreover, experimental studies have been presented to show that street view images add more values when the resolutions of the aerial images are lower, and case studies have also been presented to illustrate how street view images provide useful auxiliary information to aerial images to boost performances.

Finally, a deep learning-based method has been proposed to fuse remote and social sensing data for urban land use classification. Two neural network based methods have been developed to automatically extract discriminative time-dependent social sensing signature features, which were fused with remote sensing image features extracted via a residual neural network. Besides, a deep learning-based strategy has also been developed to address the data asynchrony problem by enforcing cross-modal feature consistency and cross-modal triplet constraints during the training of the model in an end-to-end manner. Extensive experiments have been conducted on publicly available datasets to demonstrate the effectiveness of the proposed methods. The results showed that the physically sensed RS images and social activities sensed signatures can complement each other to help enhance the accuracy of urban land use classification.

Item Type: Thesis (University of Nottingham only) (PhD)
Supervisors: Qiu, Guoping
Zhang, Qian
Keywords: Land use classification; data fusion; deep learning; remote sensing; geospatial big data
Subjects: Q Science > Q Science (General)
Faculties/Schools: UNNC Ningbo, China Campus > Faculty of Science and Engineering > School of Computer Science
Item ID: 63100
Depositing User: Cao, Rui
Date Deposited: 01 Feb 2021 06:01
Last Modified: 01 Feb 2021 08:00

Actions (Archive Staff Only)

Edit View Edit View