An investigation into image-based indoor localization using deep learning

Li, Qing (2020) An investigation into image-based indoor localization using deep learning. PhD thesis, University of Nottingham.

[thumbnail of Thesis_final_QingLi.pdf] PDF (Thesis - as examined) - Repository staff only - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
Download (30MB)

Abstract

Localization is one of the fundamental technologies for many applications such as location-based service ( LBS ), robotics, virtual reality ( VR ), autonomous driving, and pedestrians navigation. Traditional methods based on wireless signals and inertial measurement unit (IMU) have inherent disadvantages which limit their applications. Although image-based localization methods seem to be promising supplements to previous methods, their applications in the indoor scenario have many challenges. Compared to the outdoor environments, indoors are more dynamic which adds difficulty to map construction. Also, indoor scenes tend to be more similar to each other which makes it difficult to distinguish different places with a similar appearance. Besides, how to utilize widely available 3D indoor structures to enhance the localization performance remains to be well explored.



Deep learning techniques have achieved significant progress in many computer vision tasks such as image classification, object detection, monocular depth prediction amongst others. However, their application to indoor image-based localization has not yet been well studied. In this thesis, we investigate image-based indoor localization through deep learning techniques. We study the problem from two perspectives: topological localization and metric localization. Topological localization tries to obtain a coarse location whilst metric localization aims to provide accurate pose, which includes both position and orientation. We also study indoor image localization with the assistance of 3D maps by taking advantage of the availability of many 3D maps of indoor scenes. We have made the following contributions:



Our first contribution is an indoor topological localization framework inspired by the human self-localization strategy. In this framework, we propose a novel topological map representation that is robust to environmental changes. Unlike previous topological maps, which are constructed by dividing the indoor scenes geometrically, and each region is represented by the aggregation of features derived from the whole region, our topological map is constructed based on the fixed indoor elements and each node is represented with their semantic attributes. Besides, an effective landmark detector is devised to extract semantic information of the objects of interest from the smart-phone video. We also present a new localization algorithm to match the detected semantic landmark sequence against the proposed semantic topological map through their semantic and contextual information. Experiments are conducted on two test sites and results show that our landmark detector is capable of accurately detecting the landmarks and the localization algorithm can perform localization accurately.



The second contribution is that we advocate a direct learning-based method using convolutional neural networks (CNNs \nomenclature{CNNs}{Convolutional Neural Networks}) to exploit the relative geometry constraints between images for image-based metric localization. We have developed a new convolutional neural network to predict the global poses and the relative pose of two images simultaneously. This multi-tasking learning strategy allows mutual regularizations for both the global pose regression and the relative pose regression. Furthermore, we designed a new loss function that embeds the relative pose information to distinguish the poses of similar images of different locations. We conduct extensive experiments to validate the effectiveness of the proposed method on two image localization benchmarks and achieve state-of-the-art performance compared to the other learning-based methods.



Our third contribution is a single image localization framework in a 3D map. To the best of our knowledge, it is the first approach to localize a single image in a 3D map. The framework includes four main steps: pose initialization, depth inference, local map extraction, and pose correction. The pose initialization step estimates the coarse pose with the learning-based pose regression approach. The depth inference step predicts the dense depth map from the single image. The local map extraction step extracts a local map from the global 3D map to increase the efficiency. Given the local map and generated point cloud, the Iterative Closest Point (ICP \nomenclature{ICP}{Iterative Closest Point}) algorithm is conducted to align the point cloud to the local map and then compute the pose correction of the coarse pose. As the key of the method is to accurately predict the depth from the images, a novel 3D map guided single image depth prediction approach is proposed. The proposed method utilized both the 3D map and the RGB image where we use the RGB image to estimate a dense depth map and employ the 3D map to guide the depth estimation. We show that our new method significantly outperforms current RGB image-based depth estimation methods for both indoor and outdoor datasets. We also show that utilizing the depth map predicted by the new method for single indoor image localization can improve both position and orientation localization accuracy over state-of-the-art methods.

Item Type: Thesis (University of Nottingham only) (PhD)
Supervisors: Qiu, Guoping
Garibaldi, Jonathan
Keywords: software localization, software localisation, deep learning
Subjects: Q Science > QA Mathematics > QA 75 Electronic computers. Computer science
Faculties/Schools: UK Campuses > Faculty of Science > School of Computer Science
Item ID: 59976
Depositing User: Li, Qing
Date Deposited: 15 Jul 2020 14:58
Last Modified: 15 Jul 2020 15:00
URI: https://eprints.nottingham.ac.uk/id/eprint/59976

Actions (Archive Staff Only)

Edit View Edit View