A crowdsourcing approach to natural language automotive command interfaces

Parr, Jonathan (2017) A crowdsourcing approach to natural language automotive command interfaces. PhD thesis, University of Nottingham.

[thumbnail of A crowdsourcing approach to natural language automotive command interfaces.pdf] PDF (Thesis - as examined) - Repository staff only - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
Download (4MB)

Abstract

Modern vehicles present drivers with a wide range of control options, the quantity of which have increased with technologies such as satellite navigation and advanced in-car entertainment systems. Conventional controls can cause safety-related effects such as requiring drivers to look away from the road. Voice interfaces offer the potential to reduce or eliminate visual distraction; however industry surveys show current implementations to be disliked by many users who consequently revert to conventional controls.

This research considers why users reject voice interfaces and develops methods to address these concerns. Focus is given to the use of crowdsourcing methods to obtain geographically tagged command samples across a broad area. A simple token-based pattern-matching command processor is developed which bases command interpretation on command samples which are selected by geographic region and which therefore increase habitability by better matching each user’s choice of command words. Crowdsourcing methods to enable user-selection of system response thresholds are also considered.

Study one elicited the views of 20 existing users of automotive voice interfaces. Analysis found that respondents did not find the pre-set command phrases intuitive, that a voice interface taking incorrect actions was particularly dissatisfying and that overall user satisfaction increased with the number of commands correctly recognised.

Studies two and three used 16 and 20 participants respectively to assess the potential to use crowdsourcing to build a corpus of typed modality commands and how typed and spoken modality commands might differ. In study two, command samples were terse and prompting using images of conventional controls resulted in ambiguity. Typed and spoken modality command samples contained both common and modality-specific words and phrases. The refined prompting method used in study three showed animations of actions occurring and this reduced ambiguity. The selected actions elicited more verbose commands and whilst individual participants often used differing phrases for spoken and typed modality commands, grouping multiple participants' commands significantly increased the proportion of common words.

Chapter five describes the development of a simple token-based command processing method which could make use of crowdsourced data to interpret voice commands. This method used a cross validation process to construct statistical models of expected sub-sentence pattern-matching for each action using a corpus of sample commands. Statistical models incorporated levels of expected pattern-matching for both intended actions and for non-intended actions. This method required only corpus data which could be obtained via crowdsourcing.

In study four, the model developed in chapter 5 was used to statistically evaluate the proportion of actions correctly identified for a set of spoken commands using spoken and typed sample corpuses. Actions were correctly identified for 85.8% of commands using a spoken corpus and 87.1% of commands using a typed corpus. The percentage of cases where the intended action was one of the two actions with the highest pattern-matching level was 95.8% for the spoken corpus and 92.5% for the typed corpus which showed the potential benefit of user-disambiguation where pattern-matching levels are similar. Study five assessed the degradation in action identification in cases where a voice recognition method had failed to correctly recognise one word from a command. The percentage of actions correctly identified was not affected by the removal of command words for either modality.

The final study of the thesis comprised a crowdsourced user-trial of 55 participants who gave voice commands in response to animated prompts and then viewed responses from the command processor while adjusting response thresholds. Potential responses included taking an action without further confirmation, confirming an action before enacting it and asking the participant to rephrase the command. A disambiguation threshold allowed participants to set the degree of pattern matching similarity which would require a user decision. The study method required participants to compromise between different types of response error. Results showed that some participants preferred not to be asked for confirmation at any point and that once this group was excluded, it was possible to select a single set of thresholds which met the system-response choice of all participants.

This thesis shows that crowdsourcing is a suitable method to obtain command samples across a broad area to capture a range of command phrasing dialects and that selecting samples which match the speech patterns of specific users allows a simple pattern matching natural language voice interface method to robustly identify the actions intended by voice commands in a high proportion of cases. It was demonstrated that crowdsourcing can also be used to identify common system response levels to optimise user interfaces to best mitigate failures to identify an intended action. These findings propose a novel framework for automotive voice interface development with the potential to increase habitability and satisfy users across a broad geographic region and thereby realise the safety benefits of automotive voice interfaces.

Further research opportunities include the use of context-related data to improve command interpretation accuracy and the use of user-system interactions to further optimise voice interface responses. The sensitivity of the presented methods to a more diverse range of dialects may also be an area for study. Opportunities for human factors research include social acceptability of using a natural language speech modality to control a vehicle and the opportunity to corroborate and expand upon the wide range of safety-related research concerning the potential benefits of voice interfaces.

Item Type: Thesis (University of Nottingham only) (PhD)
Supervisors: Burnett, Gary E.
Keywords: crowdsourcing crowdsourced automotive natural language interface
Subjects: Q Science > QA Mathematics > QA 75 Electronic computers. Computer science
T Technology > TL Motor vehicles. Aeronautics. Astronautics
Faculties/Schools: UK Campuses > Faculty of Engineering
Item ID: 47155
Depositing User: Parr, Jonathan
Date Deposited: 13 Dec 2017 04:40
Last Modified: 14 Dec 2017 08:23
URI: https://eprints.nottingham.ac.uk/id/eprint/47155

Actions (Archive Staff Only)

Edit View Edit View