Alexa, Please Understand Me

by Win Shih

For non-native speakers, people with regional lilts, dialects, drawls, or people with speech impairments or mobility issues, it can be frustrating sometimes when voice assistant seems not getting their utterance. “Sorry, I can’t help with that,” “Sorry, I’m having trouble understanding right now,” or “Sorry, I didn’t get that.” It is not uncommon that people who do not speech standard English have to repeat themselves when interacting with VA.

According to a 2018 Washington Post article, non-native English speakers are 30 percent likely to be misunderstood by Alexa. Specifically, native Spanish speakers are understood 20 percent less often than the native English speakers by Google Assistant. People with Chinese accent are comprehended 10 percent less by Alexa than native English speakers. Furthermore, it took Alexa a bit longer to understand and respond to requests from non-native speakers.

At USC Libraries, we asked a small group of colleagues to query library catalog using an Alexa application developed by the library. We found that non-native speakers are less successful than native speakers (73% vs. 75%) in author, title, and keyword searches.

Several usability studies further compare the user experience of native and non-native English speakers when interacting with voice assistant. At University of Colleague Dublin, researchers found that there are semantic and stylistic differences between commands used by those two groups of users. Native speakers tend to focus on the structure of their verbal commands to ensure they are clear, brief, and simple. On the other hand, non-native speakers are aware of their language limitations and pay more attention to the accuracy of pronunciation, vocabulary selection, and lexical knowledge. Native speakers are annoyed by having to wait long for voice assistant’s responses, while non-native speakers require additional time to formulate their queries and utterances. Non-native speakers also prefer to interact with voice assistant through smart phone which displays their queries and voice assistant’s response on the screen. They said that on-screen feedback and speech recognition transcriptions helps them diagnose any communication issues, improve future interaction, and develop their confidence. In another study, Pyae and Scifleet found that native speakers have better user experience of voice assistant and perceive it is easy to use, friendly, and useful. Non-native speakers are less positive with their experience and report more usability issues partly due to their English proficiency, cultural awareness, and the contextual situation.

Voice technology has progressed by leaps and bounced in the last decade due to the advent of artificial intelligence, machine learning, natural language processing, and cloud computer power. Voice assistants learn to comprehend spoken words by processing a diverse range of voices, verbal cues, accents, intonations, inflections, pronunciation, and forms connections demonstrated in the general population. Their less-desirable performance in comprehending non-native speakers indicates that the massive dataset used to train them is incomplete. More non-native accented speech data should be included in their training data to enhance their speech recognition models and algorithms. As the Washington Post article points out that lack of diverse voice data can inadvertently contribute to discrimination of non-native speakers, as one data scientist comments that “these systems are going to work best for white, highly educated, upper-middle-class Americans.”

Both Amazon and Google recognize the importance of developing inclusive voice-enabled technology that work well with populations with varied speech patterns and accents. It is hoped that voice assistant can continue to improve their listening capability and accommodate to wider speech patterns or accent.