Developing and improving methods for robust ensemble classification: an aggregation operator and clustering-classification approach

Agrawal, Utkarsh (2019) Developing and improving methods for robust ensemble classification: an aggregation operator and clustering-classification approach. PhD thesis, University of Nottingham.

[img] PDF (Thesis - as examined) - Repository staff only - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
Download (4MB)

Abstract

Classification is an important technique (in pattern recognition) to categorise objects within their respective groups. In most real-world pattern recognition problems, it has become difficult to achieve best performance using an individual classifier. Ensemble algorithms, which are methods combining multiple individual classifiers, have already earned widespread approval within the machine learning community due to their ability to produce results in a wide range of applications. However, some challenges still exist in order to achieve a robust classification, in particular, classification of data points which are difficult to be assigned in one of the groups, and leveraging of existing external knowledge in order to better combine individual classifier outputs (fusion step of the ensemble). This thesis comprehensively explores these two key aspects and issues among the ensemble methods.

The first challenge to generate a robust ensemble classification method is to classify data points which are difficult to label, across the applications using unlabelled datasets (and ensemble clustering frameworks). One specific problem due to this unclassified data is incomplete representation of the dataset. This limitation presents the need to introduce a new framework, which might help to improve the final classification by assigning more data to one of the groups. In this thesis, a robust two step framework is presented, which incorporates an ensemble classification stage after an ensemble clustering stage. Together, these combine to effectively identify core groups, distribute data within these groups and improve final classification through re-classifying unclustered data (that would otherwise be unassigned to any of the groups). Practical impact of the presented framework is demonstrated through application to novel real world datasets including two breast cancer datasets (breast cancer biological group stratification from the Nottingham and Edinburgh datasets), one heavy goods vehicle dataset (driving stereotypes from the Microlise dataset) and multiple standard datasets from the UCI repository (to demonstrate the robustness of the framework). Results obtained from these datasets show that our novel framework offers an improved, reliable and robust classification technique. These findings were also verified with statistical tests, visualisation techniques, cluster quality assessment and interpretation from experts (ground truth in case of the UCI repository).

The second challenge focused in this thesis is leveraging external information for improving fusion step of the ensemble for a better ensemble classification performance. Insight on data offers the potentially extremely valuable prospect of leveraging external information. The use of this additional knowledge can lead to better ‘ensemble’ classification methods. One approach to capture this information is the use of aggregation operators, which combine the information from multiple sources with respect to a Fuzzy Measure (FM), which captures the worth of all the individual sources and all of their possible combinations. Several approaches to design the FMs exist in the literature; however, these methods do not leverage the external information, which could allow us to better understand the method of data fusion (or ensemble, in the case of ensemble classification). In this thesis, the concept of so called ‘A Priori’ FMs is introduced, which are generated based on external information and thus provide an alternative to the existing FM approaches (such as the algorithm-driven FMs). The thesis then proceed to develop two specific instances of such an A Priori FM in order to support the decision level fusion step in the ensemble classification. This new ensemble classification method is empirically assessed through application to multiple independent datasets. Results indicated that in cases where external information was available, the proposed ‘A Priori’ FM based ensemble classifier is a robust method achieving improved performances.

Item Type: Thesis (University of Nottingham only) (PhD)
Supervisors: Wagner, Christian
Garibaldi, Jonathan M.
Keywords: machine learning, classification, ensemble classification
Subjects: Q Science > QA Mathematics > QA 75 Electronic computers. Computer science
Faculties/Schools: UK Campuses > Faculty of Science > School of Computer Science
Item ID: 59059
Depositing User: Agrawal, Utkarsh
Date Deposited: 29 Sep 2023 07:43
Last Modified: 29 Sep 2023 07:43
URI: https://eprints.nottingham.ac.uk/id/eprint/59059

Actions (Archive Staff Only)

Edit View Edit View