Have you ever wondered why video recommendations never seem to be a close match to your interests or mood? Why when you ask your favorite voice assistant to suggest a dark movie with plenty of action all you hear is “sorry, I don’t know how to help with that.” It could be that the metadata – data about movies and TV shows – that those systems use doesn’t have the detail needed. Gracenote wants to change that with its new Video Descriptors.
Traditional metadata captures information such as show title, actors, description, and genre. The data was enough to populate a grid TV guide and display the program details when requested by the user. It was also enough to power simple searches on names and general categories. Unfortunately, this level of data falls far short of what’s needed by more sophisticated applications like recommendations and voice search.
Capturing more information about video
Gracenote’s Video Descriptors, part of the company’s Advanced Discovery products, extends the metadata to capture much more information about the video. For example, details such as mood, theme, scenario, and characters can be added to a show or movie description.
The extended metadata allows the video to be characterized with much greater detail. For example, consider a movie like Die Hard. Along with the usual description data, Video Descriptors can capture more abstract concepts like “Good versus Evil,” and “Sweet Revenge” and moods such as “Gripping” and “Dark.” It can also capture character names, like “John McClane,” and well-remembered quotes, like “yippee ki yay.”
Why more detail is needed
Details such as these are critical if a recommendation engine is to make links between different movies and shows. For example, it could be that a Die Hard fan would enjoy Game of Thrones. The show is also “Good versus Evil” and “Sweet Revenge” and also feels gripping and dark. The detail also allows voice search systems to respond to vague requests like “What was that movie where the main character says yippee ki yay?”
Getting the extra data
Populating the new data in Video Descriptors is not a trivial matter. Simon Adams, Gracenote’s Chief Product Officer, put it this way:
“There’s a real challenge around scale based on the amount of legacy content already out in the world and new content being created every day by established and emerging players.”
Relying solely on people would take a very long time and would be prohibitively expensive. Gracenote is using advanced machine learning technologies to automate much of the processing. Mr. Adams commented:
“Our in-house editors develop and define the taxonomies and training sets that are
used to sharpen the algorithms. Editorial experts are critical to the process as they have the ability to define descriptors, add correlations and address cultural nuances. The AI/machine learning component is equally critical because, not only do machines enable us to achieve scale, they’re outstanding at performing highly systemized and complex tasks.”
AI is uniquely suited to processing large amounts of video. For example, last year the BBC trained an AI system to identify programs that BBC Four audiences might like. It then analyzed program information for 250,000 shows dating back to 1953, something that would take hundreds of hours if attempted manually.
To get a better understanding of the importance of metadata to the business of media you can pick up a free copy of nScreenMedia’s TV Metadata in Transition. I recommend starting with Chapters 10 through 12.
Why it matters
The effectiveness of search, recommendations, and voice discovery are limited by the depth of data available about movies and TV programs.
For these applications to deliver more relevant results to consumers, the data must include information on mood, theme, scenario and other less tangible information.