Why ungrounded large language models can’t fix content discovery

Artifical Intelligence Content Discovery

Picture this: It’s the end of a long day and you collapse into your living room couch to delve into the world of streaming as an escape from the real world. Instead of scrolling through an endless wall of program tiles, you ask your voice-controlled remote control to find that “tense, slow-burn crime thriller movie set in a small rainy town that just came out.”

This is the beauty of generative AI: conversational, real-world content discovery. The downside, however, is that the large language models (LLMs¹) that are deployed for enterprise use aren’t equipped to answer questions in the same way that popular chatbots like Gemini and ChatGPT are. That’s because they don’t have any knowledge outside of their initial training data.

In the example above, information about a new movie will fall outside an ungrounded LLM’s training data. So, instead of queuing up a hidden gem, the AI-powered assistant will confidently invent a plot, combine elements of two similar sounding movies or recommend a film that your service doesn’t even have the rights to distribute.

When ‘good enough’ actually isn’t

In the world of AI, less-than-perfect responses are a well-known byproduct of technology that responds based on probability. Popular chatbots, however, benefit from being linked to credible, external data sources (e.g., ‘grounding’) to backstop their responses. This is not the case with enterprise LLMs. They only know what they were trained with. As a result, they often don’t know how to respond, or they will simply make things up (e.g., ‘hallucinate’).

For example, a recent Gracenote study found that ungrounded LLMs struggle to provide any information about movies premiering over the past two years, regardless of box office returns and coverage in popular entertainment publications. Examples include GOAT, Mercy, Send Help, Solo Mio and It Was Just an Accident.

Just because an LLM doesn’t have the right answer, however, doesn’t mean it will admit to not knowing. In our study, which included 2,600 top TV shows and movies across 13 countries, an ungrounded LLM hallucinated 100% of the response information for 506 of the titles (20% of the titles in the study).

For quality results, the details are in the data

To connect viewers with something they want to watch, LLMs need more than real-time information and the ability to match metadata attributes. Delivering on rising customer expectations as AI usage grows will depend on a service’s ability to provide comprehensive information about the thousands of titles within individual video catalogs. Here, ungrounded LLMs aren’t up to the task.

To assess the overall performance of the ungrounded LLM, we assigned quality scores to the complete results provided for all 2,600 titles. The scores (zero-, low-, medium- and high-quality) reflect the combination of two separate assessments: metadata attribute matching against the grounding data and the factual accuracy of the responses.

Across the 2,600 titles, the aggregated zero-, low- and medium-quality results were quite high, ranging from 77% to 91%. Less than one-third of the results were deemed high quality. In the Netherlands and Mexico, less than 10% were deemed high quality.

Amid growing content fragmentation and subscription fatigue, success in video entertainment will increasingly depend on the user experience. Here, LLMs have the power to alleviate growing frustrations about content discovery—but not if they deliver bad results. And all too often, especially in enterprise use cases, ungrounded LLMs aren’t going to deliver the credible, real-time results that will define success as catalogs grow and distribution becomes even more decentralized.

For more insight, download our recent ungrounded LLM response study.

Note

LLMs are a type of generative AI that are trained on massive amounts of data to understand and generate human-like language.

Why ungrounded large language models can’t fix content discovery

When ‘good enough’ actually isn’t

For quality results, the details are in the data

Note

Related tags

Share

Latest insights

Can AI find the right TV episode? Not reliably.

Plot holes in AI

Sports content on global SVOD services will soon rival movie catalogs

Get in touch

Thank you for reaching out to us!