Improving visual-to-auditory cross-modality information conversions

Tan, Shern Shiou (2019) Improving visual-to-auditory cross-modality information conversions. PhD thesis, University of Nottingham.

PDF (Thesis - as examined) - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
Download (112MB) | Preview


Sensory substitution devices have been widely used as an assistive tool, mainly for the purpose of rehabilitation for people with disabilities. With the development of electronics and computing devices, the application of visual-to-auditory sensory substitution (VASS) is becoming widespread in sensory substitution devices for the visually impaired. These devices convert visual information from images into an auditory form, known as a soundscape, allowing listeners to visualize their surrounding by interpreting the audio representation they hear. Despite its potential benefits, the technology has not been gaining acceptance among the public because of its weaknesses, such as the interpretability of the soundscapes and the quality of the user experience. The aims of this study were to improve cross-modality conversions in areas that include interpretability, information preservation, and the generation of soundscapes that afford a better listening experience. The use of image processing methods for the purpose of visual feature extraction is demonstrated in order to help the user to better interpret the soundscape they hear. By combining audio synthesis with the sounds of musical instruments and mapping colours to these sounds, systems that generate soundscapes that not only contain more information than that produced by traditional devices but also a_ord a more pleasant listening experience are created. Finally, a new evaluation and optimization methods are proposed to allow better visual-to-auditory feature mapping and foster a more up-to-date means of developing such devices. According to the experimental results and user feedback, the performance of VASS systems created using proposed techniques, in general, improves compared to the traditional systems in terms of ease of usage and user utility. It is encouraging that in the future improved devices can be developed following the direction proposed in this research coupled with more up-to-date techniques, such as machine learning.

Item Type: Thesis (University of Nottingham only) (PhD)
Supervisors: Maul, Tomas Henrique Bode
Mennie, Neil Russell
Mitchell, Peter
Keywords: image processing, computer vision, sensory substitution, sonification, information theory, experimental pyschology, vision, imaging
Subjects: Q Science > QA Mathematics > QA 75 Electronic computers. Computer science
T Technology > TA Engineering (General). Civil engineering (General) > TA1501 Applied optics. Phonics
Faculties/Schools: University of Nottingham, Malaysia > Faculty of Science and Engineering — Science > School of Computer Science
Item ID: 55721
Depositing User: Rozario, Margaret
Date Deposited: 27 Feb 2019 04:40
Last Modified: 27 Feb 2019 04:40

Actions (Archive Staff Only)

Edit View Edit View