What is the goal of voice technology in video service today and how close to achieving it will be in 2021? Here’s what three experts on the subject had to say.
At NAB 2019 in Las Vegas, I moderated a panel discussion entitled Voice Control and AI: Pushing the TV Experience Forward. Experts from IBM, Comcast, and Gracenote joined me in the debate. I used two key questions to start and end the panel. I first asked a deceptively simple question: what is the objective of the technology? Then, I asked the panelist how far along on the road toward that goal we would be by NAB 2021? Here’s what each had to say.
Flatten the UI, replace the remote
Amit Bagga, Vice President, Research and Development, Comcast runs the AI and Machine Learning Center for Excellence at Comcast and runs the OS remote team. Comcast has made much progress with voice search and control over the last five years. He says that the company has 20 million voice remotes in use by Xfinity Video subscribers and licensors of the platform. Those voice-enabled customers issued 9 billion commands in 2018, which works out to more than one request per remote per day.
Mr. Bagga says one of the primary tasks of voice services is to save the customer from having to navigate the guide to find what they want:
“Our goal was to flatten the UI, get the users to the content with the least number of clicks possible. “
He sees voice technology continuing to evolve quickly and ultimately supplanting the remote altogether:
“I think we will have hands-free control of the TV. You will not have to pick up the remote if you chose not to.”
Simplify and make it a conversation
Simon Adams, Chief Product Officer of Gracenote, agrees with Mr. Bagga that simplifying the process of finding something to watch is a crucial goal:
“To make the user experience much simpler and easier to find stuff they want to watch and listen to.”
He took the idea of simplification much farther when thinking about the next two years.
“I don’t think we’ll be in a remoteless world, but I think we’ll be in a much better conversational place.”
Voice recognition and personalization of a reply has become much more accurate over the last several years. However, the ability to refine and modify results in a conversational manner remains a challenge. Mr. Adams believes the technology will make much progress in this area by 2021.
Understand intent, make the technology more accessible
Peter Gugliemino is Chief Technology Officer at IBM and focuses on media and entertainment products. He works closely with the Watson Media team. He sees the AI technology being used to not only understand what is said but also the mood and intent of the speaker:
“The goal of speech technology, in general, is to get better insight into what they <the users> are looking to do.”
For example, Watson is being used in call centers to understand the emotional state of a caller and to route the call appropriately based on that information.
Looking ahead, Mr. Gugliemino sees voice technology becoming more broadly available and working in speech-unfriendly environments:
“In two years, we’ll have more accessibility and more languages. One of the key problems that we’re facing is that everything is in English. I also think that the signal processing that is involved in understanding overlapping speakers and separating speech from background noise will be much improved.”
Indeed, the adoption of speech technology for media search and control is much lower in non-English-speaking countries. As well, users will be pleased with the improved ability of voice recognition systems to separate a specific voice from background noise.
Why it matters
Simplifying the business of finding something to watch is a crucial goal of voice technology in media.
Soon the technology will:
- Be capable of carrying on a conversation to narrow search results
- Replace the remote completely – if that’s what a user wants to do
Work with more languages and be more robust in hostile environments.