Privileged information for data clustering

Feyereisl, Jan and Aickelin, Uwe (2012) Privileged information for data clustering. Information Sciences, 194 . pp. 4-23. ISSN 0020-0255

[img] PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
Download (641kB)

Abstract

Many machine learning algorithms assume that all input samples are independently and identically distributed from

some common distribution on either the input space X, in the case of unsupervised learning, or the input and output

space X Y in the case of supervised and semi-supervised learning. In the last number of years the relaxation of this

assumption has been explored and the importance of incorporation of additional information within machine learning

algorithms became more apparent. Traditionally such fusion of information was the domain of semi-supervised

learning. More recently the inclusion of knowledge from separate hypothetical spaces has been proposed by Vapnik

as part of the supervised setting. In this work we are interested in exploring Vapnik’s idea of ‘master-class’ learning

and the associated learning using ‘privileged’ information, however within the unsupervised setting.

Adoption of the advanced supervised learning paradigm for the unsupervised setting instigates investigation into

the dierence between privileged and technical data. By means of our proposed aRi-MAX method stability of the

K-Means algorithm is improved and identification of the best clustering solution is achieved on an artificial dataset.

Subsequently an information theoretic dot product based algorithm called P-Dot is proposed. This method has the

ability to utilize a wide variety of clustering techniques, individually or in combination, while fusing privileged and

technical data for improved clustering. Application of the P-Dot method to the task of digit recognition confirms our

findings in a real-world scenario.

Item Type: Article
Additional Information: NOTICE: this is the author’s version of a work that was accepted for publication in Information Sciences. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published in Information Sciences, 194, (2012), doi: 10.1016/j.ins.2011.04.025
Schools/Departments: University of Nottingham UK Campus > Faculty of Science > School of Computer Science
Identification Number: https://doi.org/10.1016/j.ins.2011.04.025
Depositing User: Aickelin, Professor Uwe
Date Deposited: 17 Jun 2013 13:27
Last Modified: 15 Sep 2016 16:24
URI: http://eprints.nottingham.ac.uk/id/eprint/2026

Actions (Archive Staff Only)

Edit View Edit View