Subspace-based dynamic selection for high-dimensional data

Maciel-Guerra, Alexandre (2022) Subspace-based dynamic selection for high-dimensional data. PhD thesis, University of Nottingham.

[thumbnail of PhD Thesis - Alexandre Maciel Guerra - October 2022.pdf]
Preview
PDF (Thesis - as examined) - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
Available under Licence Creative Commons Attribution.
Download (6MB) | Preview

Abstract

The number of features collected has increased greatly in the past decade, particularly in medicine and life sciences, which brings challenges and opportunities. Making reliable predictions, exploring associations and extracting meaningful information in high-dimensional data are some of the problems that are yet to be solved. Due to intrinsic properties of high-dimensional spaces such as distance concentration and hubness, traditional classification and clustering algorithms face difficult challenges. In general, a Multiple Classifier System (MCS) provides better classification accuracy than individual classifiers. One of the most promising approaches to MCS is Dynamic Selection (DS) methods, which work by selecting classifiers on the fly, according to each unknown test sample. The rationale behind this is that not every classifier is an expert in predicting all samples, rather each classifier or a combination of classifiers is an expert in a different region of the feature space; whose quality can significantly impact the overall performance.

This thesis provides three major contributions. First, traditional DS methods fail to perform effectively in high-dimensional data sets due to the use of a k-Nearest Neighbour (k-NN) to define the region competence and, moreover, they do not indicate which are the most important features for classification. Second, two frameworks were proposed the Subspace-Based Dynamic Selection (SBDS) and the Classifier SBDS (cSBDS) which integrate characteristics of DS methods and subspace clustering. Subspace clustering methods localise their search for clusters and are able to uncover clusters that exist in multiple, possible overlapping subspaces of features and/or samples. The subspace clustering approach separates the high-dimensional feature space into small feature spaces with a reduced number of features and samples in each one. The results indicate that the cSBDS framework performs statistically better when compared to DS methods and majority voting on real-world and synthetic datasets. Third, we provide a comparison between the features selected by the cSBDS framework and feature importance methods. The results indicate that for high-dimensional datasets, the cSBDS framework is able to capture the most important features when the number of clusters per class is increased, while traditional feature importance methods lose this capability.

Item Type: Thesis (University of Nottingham only) (PhD)
Supervisors: Twycross, Jamie
Figueredo, Grazziela
Keywords: dynamic selection; multiple classifier system; subspace clustering; high-dimensional data; datasets
Subjects: Q Science > QA Mathematics > QA 75 Electronic computers. Computer science
Faculties/Schools: UK Campuses > Faculty of Science > School of Computer Science
Item ID: 71623
Depositing User: Maciel Guerra, Alexandre
Date Deposited: 22 Feb 2024 13:50
Last Modified: 22 Feb 2024 13:50
URI: https://eprints.nottingham.ac.uk/id/eprint/71623

Actions (Archive Staff Only)

Edit View Edit View