Towards uncertainty-aware and label-efficient machine learning of human expressive behaviour

Tellamekala, Mani Kumar (2022) Towards uncertainty-aware and label-efficient machine learning of human expressive behaviour. PhD thesis, University of Nottingham.

[img]
Preview
PDF (Final revised thesis with minor corrections incorporated) (Thesis - as examined) - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
Available under Licence Creative Commons Attribution.
Download (27MB) | Preview

Abstract

The ability to recognise emotional expressions from non-verbal behaviour plays a key role in human-human interaction. Endowing machines with the same ability is critical to enriching human-computer interaction. Despite receiving widespread attention so far, human-level automatic recognition of affective expressions is still an elusive task for machines. Towards improving the current state of machine learning methods applied to affect recognition, this thesis identifies two challenges: label ambiguity and label scarcity.

Firstly, this thesis notes that it is difficult to establish a clear one-to-one mapping between inputs (face images or speech segments) and their target emotion labels, considering that emotion perception is inherently subjective. As a result, the problem of label ambiguity naturally arises in the manual annotations of affect. Ignoring this fundamental problem, most existing affect recognition methods implicitly assume a one-to-one input-target mapping and use deterministic function learning. In contrast, this thesis proposes to learn non-deterministic functions based on uncertainty-aware probabilistic models, as they can naturally accommodate the one-to-many input-target mapping. Besides improving the affect recognition performance, the proposed uncertainty-aware models in this thesis demonstrate three important applications: adaptive multimodal affect fusion, human-in-the-loop learning of affect, and improved performance on downstream behavioural analysis tasks like personality traits estimation.

Secondly, this thesis aims to address the challenge of scarcity of affect labelled datasets, caused by the cumbersome and time-consuming nature of the affect annotation process. To this end, this thesis notes that audio and visual feature encoders used in the existing models are label-inefficient i.e. learning them requires large amounts of labelled training data. As a solution, this thesis proposes to pre-train the feature encoders using unlabelled data to make them more label-efficient i.e. using as few labelled training examples as possible to achieve good emotion recognition performance. A novel self-supervised pre-training method is proposed in this thesis by posing hand-engineered emotion features as task-specific representation learning priors. By leveraging large amounts of unlabelled audiovisual data, the proposed self-supervised pre-training method demonstrates much better label efficiency compared to the commonly employed pre-training methods.

Item Type: Thesis (University of Nottingham only) (PhD)
Supervisors: Valstar, Michel
French, Andrew
Keywords: Emotion Recognition, Machine Learning, Uncertainty Modelling, Self-supervised Learning, Probabilistic Machine Learning Models, Stochastic Process Regression, Uncertainty-Aware Learning, Label Ambiguity
Subjects: Q Science > Q Science (General)
Q Science > QA Mathematics > QA 75 Electronic computers. Computer science
Faculties/Schools: UK Campuses > Faculty of Science > School of Computer Science
Item ID: 71876
Depositing User: Tellamekala, Mani Kumar
Date Deposited: 15 Dec 2022 09:56
Last Modified: 15 Dec 2022 09:56
URI: https://eprints.nottingham.ac.uk/id/eprint/71876

Actions (Archive Staff Only)

Edit View Edit View