Why AI Agents Still Stumble (and How Humans Can Help Them Fly)

excellent discussion of what AI agents can and will be able to do

Read this whole transcript or watch the full original video.

“AI agents hold enormous promise... but they’re still making dumb mistakes.”
– Mike Bird, Host of Tool Use

We’re not here to bash agents. They’re impressive — fast, scalable, and low-key magical when they work. But let’s be honest: they don’t always work. Sometimes they hallucinate. Sometimes they get stuck in loops. Sometimes they drop your SQL tables like it’s hot without checking with the human who signs the checks.

So what’s the fix? One word: humans.

🤖 Agents Are Cool. Agents + Humans? Even Cooler.

Dexter Horthy, founder of Human Layer, knows firsthand the beauty and the chaos of building AI agents. What started as a helpful tool to auto-clean unused SQL tables quickly turned into a reality check: autonomous agents can mess things up fast if no one’s watching.

“We had to build a system where the agent checks with a human before doing anything scary — like deleting a table someone’s CEO might still be using.”

That realization gave birth to what Dexter calls “outer loop agents” — systems that initiate workflows and proactively bring humans in only when they hit decision points. Not “AI that waits for you to ask.” AI that takes initiative, but respects the chain of command.

💬 Why “Ask Before Acting” Is the New Killer UX

Most AI tools today are reactive — chatbots that wait for your command. Outer loop agents flip that. They watch for changes, trigger on events, and come to you with a plan.

“Hey, this looks weird. Should I fix it?”

And here’s the kicker: the interaction doesn’t have to be inside a fancy app. Dexter’s team is proving that agents can reach you via Slack, email, even SMS — and you can respond from wherever you are, even your watch.

It's a new kind of human-AI relationship:

  • Less command line.

  • More co-pilot.

  • Fewer “Oops.”

  • More “Just checking — is this cool?”

🛠️ What Every Agent Builder Gets Wrong

Mike and Tyierro asked Dexter a classic podcast question:
“What are most builders getting wrong?”

His answer?

“They give an agent a prompt and a bag of tools and hope for the best.”

That’s not engineering. That’s wishful thinking.

What Dexter suggests instead is “micro-agents” — small, single-responsibility agents with clear boundaries and tight controls. Think: a for-loop, a switch statement, and a shared history of what’s already happened. Build it like a pipeline. Add knobs and dials.

“The difference between 89% and 91% accuracy is massive. Especially in multi-step workflows. Errors compound fast.”

⚙️ Eval Like You Mean It

You know what else builders skip? Evaluations.

Sure, you could rely on vibe checks — “Feels right!” — or hope the model output looks good. But Human Layer treats evals like unit tests:

  • Real prompts from production

  • Assert the expected next action

  • Log structured JSON, not freeform text

Because if your agent’s job is to auto-deploy infrastructure or write to your CRM, you need repeatable reliability. Not vibes.

📦 Context Engineering > Prompt Engineering

Forget just tweaking prompts. The real meta-game? Context engineering.

Dexter breaks it down:

  1. Prompt – your instructions

  2. Context – docs, history, and current state

  3. Output structure – tell the model how to respond

  4. Control flow – what happens after each response

You’re not just asking a model a question. You’re building an ongoing conversation with memory, structure, and logic baked in.

“LMs are stateless. Everything it knows is what you give it. So give it the right stuff.”

🧩 What's Missing Before We Hit “Level 5 Autonomy”?

The big dream is AI agents that can just go. You give them a goal — launch this campaign, handle this outage, refactor this codebase — and they run with it. Zero oversight.

Reality check? We’re not there yet. And maybe we don’t need to be.

“There are things we don’t even let humans do without approval. Why should agents be any different?”

So instead of chasing full autonomy, Dexter’s building tools that work today — layering approvals, tracking steps, and designing agents to hand off tasks cleanly to humans when needed.

🔮 What’s Next for Human Layer (and You)?

Human Layer is quietly working on an MCP-first agent framework. It's built around the idea that the real challenge isn't just agent intelligence — it's coordination. Especially when you’re dealing with hundreds (or thousands) of agents, tools, and communication channels.

And for the rest of us?

Start small. Build micro-agents. Wrap them in approvals. Use Slack as your UI. Don’t reinvent infrastructure — just plug into it. And always, always leave a lane for the human to step in.

🧰 TL;DR – Dexter’s Builder’s Playbook

  • ✅ Build agents with boundaries

  • 🔁 Use micro-agents, not magic

  • 🧩 Engineer the context, not just the prompt

  • 🧪 Write evals like unit tests

  • 🧠 Keep the human in the loop

  • 📬 Let agents talk to you via Slack, email, or even SMS

  • 🔧 Stay modular. Stay flexible. Stay in control.

Want more?
👉 Check out the full episode of Tool Use featuring Dexter Horthy for even more wisdom on building smarter, safer, more reliable AI agents.

Read this whole transcript or watch the full original video.

Previous
Previous

Delta Airlines Predictive Maintenance Use Case