Amazon’s AI improves emotion detection in voices

By: Kyle Wiggers

MAY 21 2019

Much can be gleaned from the tone of someone’s voice, which is a natural conduit for emotion. And emotion has a range of applications: It can aid in health-monitoring by helping detect early signs of dementia or heart attack, and it has the potential to make conversational AI systems more engaging and responsive. Someday, emotion might even provide implicit feedback that could help voice assistants like Google Assistant, Apple’s Siri, and Amazon’s Alexa learn from their mistakes. 

Emotion-classifying AI isn’t anything new, but traditional approaches are supervised, meaning that they ingest training data labeled according to speakers’ emotional states. Scientists at Amazon took a different approach recently, which they describe in a paper scheduled to be presented at the International Conference on Acoustics, Speech, and Signal Processing. Rather than sourcing an exhaustively annotated “emotion” corpus to teach a system, they fed an adversarial autoencoder a publicly available data set containing 10,000 utterances from 10 different speakers. The result? The neural network was up to 4% more accurate at judging valence, or emotional value, in peoples’ voices.