Vyakaranam, Aparna
(2026)
Speech emotion recognition (SER) system for late-deafened educators in online teaching.
PhD thesis, University of Nottingham.
Abstract
Speech emotion recognition (SER) involves predicting human emotions from speech signals, aiding in the understanding of human behaviour and offering opportunities in human-computer interaction (HCI). It is widely applicable across domains such as psychology, medicine, education, and entertainment. This research explores the development of an SER system to support late-deafened educators in online teaching environments.
A review of relevant literature highlighted the importance of emotional engagement, defined as students’ emotional responses to academic content, which is essential for effective learning and often conveyed through vocal and behavioural cues. However, in online classes, such non-verbal cues are limited due to the lack of physical presence, resulting to what is referred to as emotional deficiency. This challenge is particularly significant for late-deafened educators, who may find it difficult to hear or interpret verbal feedback, making it harder to gauge student emotions and engagement. To address this, a real-world SER system was developed to detect and display student emotions from verbal feedback accurately and in real time. The aim was to help late-deafened educators better understand student engagement and adjust their teaching strategies accordingly during online classes.
A preliminary study indicated emotional deficiency in online classes and highlighted the value of integrating emotional feedback into online teaching environments. The proposed system extracted acoustic features such as Zero Crossing Rate (ZCR), Root Mean Square (RMS), Chroma-STFT, Mel Frequency Cepstral Coefficients (MFCCs), and Mel-spectrograms. Three hybrid CNN architectures combining 1D, 2D, and 3D layers were explored through a novel comparative analysis using fusion strategies: averaging, parallel merging, and sequential integration. These models were evaluated on five benchmark datasets—IEMOCAP, DEMoS, TESS, RAVDESS, and EMO-DB. The averaging fusion model consistently outperformed the others, achieving accuracies of 82% on IEMOCAP, 91% on DEMoS, EMO-DB, and RAVDESS, and 100% on TESS, and was therefore selected for implementation.
The final system featured a user-friendly graphical user interface (GUI) and was evaluated for usability and user experience through testing with educators both with and without hearing impairment. Quantitative results showed that 90% of users found the system intuitive and effective for real-time emotion detection, and 80% of late-deafened educators reported it accurately captured student emotions. Qualitative feedback further emphasized its value in helping educators tailor instruction based on emotional cues. This research demonstrates a very high practical value of integrating SER into online teaching to enhance late-deafened educators’ awareness of student emotional engagement and to support more adaptive teaching strategies. However, the developed system relies solely only on five discrete, universally accepted human emotions, which are further classified into positive or negative emotions. It relies partly on acted speech datasets, which may not fully capture the subtle, diverse, and spontaneous expressions typical in real classrooms. The absence of multimodal cues, such as facial expressions or textual input, limits the system’s ability to provide a holistic understanding of student emotions in real time. Future enhancements can include expanding emotion categories to cover education-specific states like confusion or boredom, and integrating multimodal cues (facial expressions, text and such) to improve real-time accuracy and contextual understanding across diverse learning environments.
| Item Type: |
Thesis (University of Nottingham only)
(PhD)
|
| Supervisors: |
Maul, Tomas Ramayah, Bavani |
| Keywords: |
speech emotion recognition; human computer interaction; late deafened educators; usability and user experience; convolutional neural networks; hybrid models, semi-natural datasets |
| Subjects: |
Q Science > Q Science (General) |
| Faculties/Schools: |
University of Nottingham, Malaysia > Faculty of Science and Engineering — Science > School of Computer Science |
| Item ID: |
82008 |
| Depositing User: |
Vyakaranam, Aparna
|
| Date Deposited: |
07 Feb 2026 04:40 |
| Last Modified: |
07 Feb 2026 04:40 |
| URI: |
https://eprints.nottingham.ac.uk/id/eprint/82008 |
Actions (Archive Staff Only)
 |
Edit View |