Combining residual networks with LSTMs for lipreading

Tools

Stafylakis, Themos and Tzimiropoulos, Georgios (2017) Combining residual networks with LSTMs for lipreading. In: Interspeech 2017, 20-24 August 2017, Stockholm, Sweden.

Full text not available from this repository.

Official URL: http://www.isca-speech.org/archive/Interspeech_2017/abstracts/0085.html

Abstract

We propose an end-to-end deep learning architecture for word level visual speech recognition. The system is a combination of spatiotemporal convolutional, residual and bidirectional Long Short-Term Memory networks. We trained and evaluated it on the Lipreading In-The-Wild benchmark, a challenging database of 500-size vocabulary consisting of video excerpts from BBC TV broadcasts. The proposed network attains word accuracy equal to 83.0%, yielding 6.8% absolute improvement over the current state-of-the-art.

Item Type:

Conference or Workshop Item (Paper)

RIS ID:

https://nottingham-repository.worktribe.com/output/861527

Additional Information:

Paper available on http://www.isca-speech.org/iscaweb/index.php/archive/online-archive. pp. 3652-3656. doi:10.21437/Interspeech.2017-85

Keywords:

visual speech recognition, lipreading, deep learning

Schools/Departments:

University of Nottingham, UK > Faculty of Science > School of Computer Science

Related URLs:

URL	URL Type
http://www.interspeech2017.org/	UNSPECIFIED
http://www.isca-speech.org/iscaweb/index.php/archive/online-archive	UNSPECIFIED
http://www.isca-speech.org/archive/Interspeech_2017/pdfs/0085.PDF	UNSPECIFIED
http://www.interspeech2017.org/calls/papers/	UNSPECIFIED

Depositing User:

Tzimiropoulos, Yorgos

Date Deposited:

10 Aug 2017 11:09

Last Modified:

04 May 2020 18:46

URI:

https://eprints.nottingham.ac.uk/id/eprint/44756

Actions (Archive Staff Only)

Edit View

LoginAdmin