Audiovisual classification of vocal outbursts in human conversation using long-short-term memory networksTools Eyben, F., Petridis, S., Schuller, Björn, Tzimiropoulos, Georgios, Zafeiriou, Stefanos and Pantic, Maja (2011) Audiovisual classification of vocal outbursts in human conversation using long-short-term memory networks. In: ICASSP 2011 - 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 22-27 May 2011, Prague, Czech Republic. Full text not available from this repository.AbstractWe investigate classification of non-linguistic vocalisations with a novel audiovisual approach and Long Short-Term Memory (LSTM) Recurrent Neural Networks as highly successful dynamic sequence classifiers. As database of evaluation serves this year's Paralinguistic Challenge's Audiovisual Interest Corpus of human-to-human natural conversation. For video-based analysis we compare shape and appearance based features. These are fused in an early manner with typical audio descriptors. The results show significant improvements of LSTM networks over a static approach based on Support Vector Machines. More important, we can show a significant gain in performance when fusing audio and visual shape features.
Actions (Archive Staff Only)
|