Exploring the representation of caricatures, facial motion, and view-invariance in face space.

Elson, Ryan (2024) Exploring the representation of caricatures, facial motion, and view-invariance in face space. PhD thesis, University of Nottingham.

[img]
Preview
PDF (final submission) (Thesis - as examined) - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
Available under Licence Creative Commons Attribution.
Download (7MB) | Preview

Abstract

Faces present a vast array of information, from invariable features such as identity, to variable features such as expression, speech and pose. Humans have an incredible capability of recognising faces (familiar faces at least) and interpreting facial actions, even across changes in view. While there has been an explosion of research into developing artificial neural networks for many aspects of face processing, some of which seem to predict neural responses quite well, the current work focuses on face processing through simpler linear projection spaces. These linear projection spaces are formal instantiations of ‘face space’, built using principal component analysis (PCA). The concept of ‘face space’ (Valentine, 1991) has been a highly influential account of how faces might be represented in the brain. In particular, recent research supports the presence of a face space in the macaque brain in the form of a linear projection space, referred to as ‘axis coding’ in which individual faces can be coded as linear sum of orthogonal features. Here, these linear projection spaces are used for two streams of investigation.

Firstly, we assessed the neurovascular response to hyper-caricatured faces in an fMRI study. Based on the assumption that faces further from average should project more strongly onto components in the linear space, we hypothesised that they should elicit a stronger response. Contrary to our expectations, we found little evidence for this in the fusiform face area (FFA) and face-selective cortex more generally, although the response pattern did become more consistent for caricatured faces in the FFA. We then explored the response to these caricatured faces in cortex typically associated with object processing. Interestingly, both the average response magnitude and response pattern consistency increased to these stimuli as caricaturing increased. At the current time it is unclear if this response allows some functional benefit for processing caricatured faces, or whether it simply reflects similarities in the low- and mid-level properties to certain objects. If the response is functional, then hyper-caricaturing could pave a route to improving face processing in individuals with prosopagnosia if technologies can be developed to automatically caricature faces in real-time.

The second line of work addressed these linear projection spaces in the context of achieving view-invariance, specifically in the domain of facial motion and expression. How humans create view-invariant representations is still of interest, despite much research, however little work has focused on creating view-invariant representations outside of identity recognition. Likewise, there has been much research into face space and view-invariance separately, yet there is little evidence for how different views may be represented within a face space framework, and how motion might also be incorporated.

Automatic face analysis systems mostly deal with pose by either aligning to a canonical frontal view or by using separate view-specific models. There is inconclusive evidence that the brain possesses an internal 3D model for ‘frontalising’ faces, therefore here we investigate how changes in view might be processed in a unified multi-view face space based on using a few prototypical 2D views. We investigate the functionality and biological plausibility of five identity-specific faces spaces, created using PCA, that allow for different views to be reconstructed from single-view video inputs of actors speaking. The most promising of these models first builds a separate orthogonal space for each viewpoint. The relationships between the components in neighbouring views are learned, and then reconstructions across views are made using a cascade of projection, transformation, and reconstruction. These reconstructions are then collated and used to build a multi-view space, which can reconstruct motion well across all learned views.

This provides initial insight into how a biologically plausible, view-invariant system for facial motion processing might be represented in the brain. Moreover, it also has the capacity to improve view-transformations in automatic lip-reading software.

Item Type: Thesis (University of Nottingham only) (PhD)
Supervisors: Johnston, Alan
Schluppeck, Denis
Keywords: face perception, space representations, caricatures, neuroimaging
Subjects: B Philosophy. Psychology. Religion > BF Psychology
R Medicine > RC Internal medicine > RC 321 Neuroscience. Biological psychiatry. Neuropsychiatry
Faculties/Schools: UK Campuses > Faculty of Science > School of Psychology
Related URLs:
URLURL Type
https://www.dropbox.com/scl/fo/kaqognq18e0a5vira9syo/h?rlkey=m0k5mrhvglk69rpjdy92lk5ha&dl=0UNSPECIFIED
Item ID: 77728
Depositing User: Elson, Ryan
Date Deposited: 23 Jul 2024 04:40
Last Modified: 23 Jul 2024 04:40
URI: https://eprints.nottingham.ac.uk/id/eprint/77728

Actions (Archive Staff Only)

Edit View Edit View