An investigation of fuzzy methods and meta learning for feature selection

Shen, Zixiao (2022) An investigation of fuzzy methods and meta learning for feature selection. PhD thesis, University of Nottingham.

[img] PDF (Thesis - as examined) - Repository staff only until 2 August 2024. Subsequently available to Anyone - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
Available under Licence Creative Commons Attribution.
Download (2MB)

Abstract

Recent developments in technology have led to accelerated growth of data, and the associated challenges of extracting information from them. Recently, the knowledge discovery process has become a central issue in extracting knowledge from data. Within this, feature selection (FS) acts as a preprocessing procedure, playing an essential role in aiming to discover a minimal feature subset or a reliable feature ranking sequence to represent the original data. However, practical datasets are inherently uncertain and imperfect due to the noise, incompleteness, and inconsistency which always exists. In this research, the fuzzy theory is introduced as a unified framework to model these uncertainties in the FS process.



Unlike semantics-preserving FS algorithms that output a final feature subset, this research aims to explore efficient feature ranking-based methods that rank the features based on feature importance. To begin with, this research investigates a fuzzy entropy-based FS and classification framework with several essential components. Different evaluation measurements and functions are implemented to find the combination which achieves the best performance. The proposed method has produced relatively high and stable classification accuracy when gradually removing features, indicating better performance than other comparable methods. Further, on account of the lack of suitable measurements to evaluate and compare FS algorithms effectively, two new evaluation methods are proposed on the aspects of accuracy and robustness. The proposed weighted accuracy and robustness measures have proven to be more sensitive on real-world and synthetic datasets. A multi-criteria evaluation method based on radar charts is also introduced, to comprehensively measure overall performance.



Next, this research investigates and proposes a novel ensemble learning framework to further improve FS algorithms' performance. The proposed method consists of three main steps: distribution generation of feature importance using bootstrap, distribution ensemble using aggregation methods, and defuzzification for feature ranking. Various methods represent the importance of features, such as score-based, rank-based, and fuzzy-based approaches. Both weighted combination and fuzzy aggregation methods are used to aggregate the different distributions. Following tuning using a reference data repository, the best combination approach is chosen for each of the score, rank, and fuzzy-based approaches, respectively. Compared with the base FS methods after the bootstrap process and the other state-of-the-art FS methods, the proposed methods have produced better performance on the testing data repository, especially for the technique using a fuzzy-based approach and drastic sum S-norm aggregation. It has shown that the proposed ensemble learning framework can improve the performance of FS algorithms in multiple aspects.



From the literature, there are a large number of FS methods, and it is impossible to state the best FS method for all kinds of data. Hence, this research finally develops a meta-learning framework to recommend a suitable FS method for a given dataset. Various synthetic datasets are generated as the training data repository, which is used to tune the parameters of the framework. Subsequently, a meta-learning framework is constructed by extracting six meta-features of the training data repository, applying the FS algorithms with the best multi-criteria performance as the meta labels, and utilizing the fuzzy similarity measure-based method as the classifier. In experiments, the proposed framework successfully recommends the best FS method from the candidate methods for six out of ten testing datasets with negligible additional time.



The limitations of the proposed methods, possible improvements, and future research directions are discussed in the last chapter of the thesis.

Item Type: Thesis (University of Nottingham only) (PhD)
Supervisors: Garibaldi, Jonathan M.
Chen, Xin
Keywords: Feature selection, FS, Fuzzy methods, Meta learning
Subjects: Q Science > QA Mathematics > QA 75 Electronic computers. Computer science
Faculties/Schools: UK Campuses > Faculty of Science > School of Computer Science
Item ID: 69332
Depositing User: Shen, Zixiao
Date Deposited: 02 Aug 2022 04:40
Last Modified: 02 Aug 2022 04:40
URI: https://eprints.nottingham.ac.uk/id/eprint/69332

Actions (Archive Staff Only)

Edit View Edit View