Towards vocal tract MRI synthesis from facial signals using external to internal correlation modelling

Shahid, Muhammad Suhaib (2025) Towards vocal tract MRI synthesis from facial signals using external to internal correlation modelling. PhD thesis, University of Nottingham.

[thumbnail of University_of_Nottingham_PhD_Thesis_Suhaib__Final_correction.pdf] PDF (Thesis - as examined) - Repository staff only - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
Available under Licence Creative Commons Attribution.
Download (29MB)

Abstract

Oral health underpins everyday functions such as speech, mastication and swallowing, yet acquiring detailed kinematic data on the vocal tract remains technically and financially demanding. Ultrasound and electromagnetic articulography offer only partial coverage, while Real Time Magnetic Resonance Imaging (RtMRI) data delivers richer information but requires expensive scanners and bespoke acquisition protocols. These constraints limit large-scale studies and the routine use of dynamic vocal-tract models in both research and clinical practice.

Motivated by the need for an affordable, non-invasive alternative, this thesis introduces External to Internal Correlation Modelling (E2ICM), a novel framework that learns correlations between external facial signals and internal articulator motion, enabling vocal-tract modelling without direct imaging. The work pursues four objectives: (i) advanced segmentation of RtMRI sequences, (ii) quantification of articulator interdependencies, (iii) prediction of internal motion from purely external observations, and (iv) ethical evaluation of AI-driven approaches in oral healthcare.

Both static and temporal segmentation pipelines are developed for RtMRI data. Generative adversarial networks and diffusion models are then employed to synthesise internal views from facial video, addressing data scarcity through tailored augmentation strategies. A thematic analysis of professional interviews highlights concerns around privacy, security and algorithmic bias, informing an ethical framework for clinical deployment.

A key contribution is a dual-view dataset comprising synchronised high-resolution RtMRI and external video captured during controlled speech and chewing tasks. Experimental results demonstrate that (E2ICM can predict vocal-tract configurations with promising accuracy while reducing reliance on costly imaging. Improved segmentation techniques and a deeper understanding of articulator dynamics further advance the state of the art in non-invasive oral-movement modelling.

Item Type: Thesis (University of Nottingham only) (PhD)
Supervisors: French, Andrew
Valstar, Michel
Yakubov, Gleb
Keywords: artificial intelligence, ai, oral health, vocal-tract modelling
Subjects: Q Science > QA Mathematics > QA 75 Electronic computers. Computer science
R Medicine > R Medicine (General) > R855 Medical technology. Biomedical engineering. Electronics
Faculties/Schools: UK Campuses > Faculty of Science > School of Computer Science
Item ID: 81139
Depositing User: Shahid, Muhammad
Date Deposited: 30 Jul 2025 04:40
Last Modified: 30 Jul 2025 04:40
URI: https://eprints.nottingham.ac.uk/id/eprint/81139

Actions (Archive Staff Only)

Edit View Edit View