Topics

Applied AI

5 lessons learned from building AI experiences with PBS KIDS.

June 21, 2026

AI demos are easy to get excited about.

You see a polished experience, the technology works the way it’s supposed to, and it’s natural to start imagining all the ways AI could change how people interact with digital products.

But real users have a way of complicating things.

We saw that firsthand while working with PBS KIDS on an AI-powered interactive video proof of concept. As part of the project, REDspace Web Team Lead Nicholas Bolger helped develop the infrastructure used to test and compare different AI capabilities.

The basic idea was simple: a child watches a video, a character asks them a question, the child responds verbally, and AI helps select the best-fit prerecorded response so the app can play the right prerecorded reply.

That last detail matters. The AI was not generating live responses to children. The experience used safe, controlled, prerecorded responses. The AI’s job was to interpret what the child said and help the system choose the right next step.

On paper, that sounds fairly straightforward.

In practice, organizations will face many challenges as they start building AI into real user experiences. Here’s what our team learned helping PBS KIDS explore this problem space.

1. AI doesn’t break in testing. It breaks with real users.

In a controlled environment, AI systems can look incredibly capable.

The inputs are clean. The questions are predictable. The user behaves as the team expects.

Real users do not.

With this project, one of the biggest challenges was not the AI model itself. It was the unpredictability of the children’s use of the experience.

A child might answer while facing away from the device. There might be a TV playing in the background, a sibling talking nearby or a parent speaking from another room. Some children gave very short answers. Some responded nonverbally by nodding instead of speaking.

That creates a much harder problem than a simple model evaluation.

The lesson is that AI systems need to be tested against real user behavior, not just ideal user behavior. If the experience only works when people speak clearly, face the microphone and provide complete answers, it probably isn’t ready for the real world.

That applies far beyond children’s media. Any organization building AI-powered experiences needs to ask the same question early: How will this system handle the way people actually behave?

2. Input quality matters more than model quality.

Early in the project, the team naturally spent a lot of time thinking about model selection.

Which model is most accurate? Which prompt works best? Which combination produces the right response most often?

Those questions matter. But as the project developed, the team found that input quality was often the bigger issue.

If a child gives a clear answer, many models can do a decent job interpreting it. If the answer is too short, unclear, noisy or nonverbal, even a powerful model may struggle.

That shifted the way we thought about the problem.

The challenge wasn’t just “how do we pick the best AI model?” It was “how do we design a system that can handle imperfect input?”

That’s an important distinction.

Organizations can spend a lot of time comparing models and tuning prompts. But if the system is built on weak, inconsistent or poorly captured inputs, model quality will only get you so far.

The best AI strategy usually starts before the model ever gets involved. It starts with understanding the user, the environment and the quality of the information flowing into the system.

3. The smartest model is not always the best model.

It’s easy to assume the most advanced model is the best choice. That’s not always true.

In our testing, more advanced models often delivered stronger results, but they also came with tradeoffs. They could be more expensive. They could be slower. And depending on the use case, the improvement in accuracy was not always worth the extra cost or latency.

For this kind of interactive experience, speed mattered a lot.

Children expect conversation to feel immediate. If a character asks a question, they don’t want to wait around for a response. A slower model might technically be more capable, but if it makes the experience feel sluggish, it may not be the right fit.

That is one of the bigger lessons for organizations building AI-powered products.

Model selection isn’t just about intelligence. It’s about fit.

For some use cases, the highest accuracy model may be worth the extra cost and latency. For others, a faster, cheaper model may deliver a better user experience because the task is simpler and the response needs to be quick.

The right question is not “what’s the best model?” The better question is “what’s the best model for this specific job?”

4. AI systems need continuous optimization.

AI isn’t something you build once and walk away from.

The model landscape changes too quickly. Costs change. Latency changes. New models are released. Existing models improve. The best choice today may not be the best choice three months from now.

That is why testing and analytics became such an important part of the PBS KIDS proof of concept.

We needed a way to compare different models, prompts and configurations across key factors like accuracy, cost and latency. That kind of framework gives teams a clearer view of how the system is performing and where tradeoffs are happening.

This is where AI starts to look less like a feature and more like an operating discipline.

If you’re building AI into a real product experience, you need visibility into how that system performs over time. You need to understand what it costs to run. You need to know when performance changes. And you need a process for reevaluating your choices as technology evolves.

That is going to become especially important as more companies move from AI experimentation to production.

The organizations that succeed will not just be the ones that build AI into their products. They’ll be the ones who know how to operate it.

That philosophy closely mirrors what we call the “REDspace Way,” an approach that emphasizes involving quality and testing teams early in the delivery process rather than treating QA as a final checkpoint at the end of development.

In AI systems, especially, waiting until launch to evaluate performance often means discovering structural problems after they have already become expensive to fix.

5. The future of AI may be multi-model systems.

The project also raised interesting questions about how different AI models could work together.

Instead of sending every request to the most expensive model, the system could start with a faster, cheaper model. If the model had high confidence in its answer, the system could move forward. If confidence was low, the request could be escalated to a more advanced model.

That kind of approach can help balance cost, speed and accuracy.

It also points to where AI systems may be heading.

In the future, organizations may not choose one model and stick with it. They may build systems that route different tasks to different models depending on the complexity of the request, the confidence level needed, the cost profile and the user experience.

That opens up a different way of thinking about AI architecture.

The goal isn’t always to use the most powerful model available. The goal is to create a system that knows when power is needed and when it is not.

For companies deploying AI at scale, that distinction could matter a lot.

The real challenge is everything around the AI.

The biggest takeaway from this project is that AI is only one part of the experience.

The model matters, but a lot of the hard problems sit outside the model itself. Things like noisy inputs, response times, monitoring, confidence thresholds and how the system behaves when things go wrong ended up having a huge impact on the overall experience.

Those were some of the biggest lessons our team encountered while supporting the PBS KIDS proof of concept.

That is the part that can get lost in the excitement around AI.

The demo shows what is possible, but it’s the system behind it that determines whether it actually works.

Seeing similar AI challenges?

Many organizations are discovering that building AI into real user experiences is more complicated than the demo makes it look. Let’s talk about how to navigate the technical and operational challenges early.

Let's talk

About the author

Andrew Hamilton is Director of Client Engineering at REDspace, where he helps lead the development of enterprise-scale digital experiences for clients across media, entertainment and technology. With nearly 20 years at REDspace, he has worked with organizations including PBS, Sesame Workshop, Nickelodeon and Sony on streaming, interactive and emerging technology initiatives.

Written by Andrew Hamilton, Director of Client Engineering