AI Agents/June 18, 2025/4 min read

Building Ambient AI For Teams

How we instrumented Sammy 3 to feel like a teammate that sees context, speaks naturally, and acts faster than humans.

The problem with most AI agents

Most agents today are glorified chatbots. They take a command, process it in isolation, and spit out a response. The context dies between conversations, the memory is shallow, and the whole experience feels like talking to a really fast intern who forgets everything you just told them.

We wanted something different with Sammy 3. Something that felt like working with a teammate who actually gets the context, remembers the details that matter, and can handle complex workflows without constant hand-holding.

Setting the intention

Ambient agents are more than API wrappers. The best ones understand context, never lose track of intent, and feel like purposeful collaborators. Sammy 3 started as a prototype to reduce customer onboarding time, but it quickly became the heartbeat of our support workflow.

The key insight? Context is everything. An agent that knows where it is, what just happened, and what the user is trying to achieve can make intelligent decisions. An agent without context is just an expensive autocomplete.

Building the contextual foundation

Context-aware screen understanding

We built a system that lets the agent understand not just what's on screen, but where it is in the user's workflow. This spatial and contextual awareness is what separates a smart agent from a glorified chatbot.

The agent can reason about user interface states, predict likely next steps, and handle edge cases without losing context—but the specific implementation details are what give it the edge.

Memory that actually works

Traditional RAG is broken for agents. You get chunks of text with no understanding of temporal relationships, user intent, or workflow state. We built a multi-layered memory architecture that tracks what's happening now, what happened before, and general domain knowledge.

The key breakthrough was making these memory layers work together to maintain context across sessions and understand user intent over time.

Shipping with ruthless feedback loops

Making the invisible visible

The hardest part about AI agents isn't building them—it's understanding why they make certain decisions and how to improve them. We built comprehensive observability into every interaction.

The key was creating feedback loops that let us debug and improve agent behavior quickly. We could trace decisions, measure performance, and iterate on prompts without losing momentum.

The data feedback loop

We measured what mattered: completion rates, resolution times, user satisfaction, and context retention. But the real breakthrough came from qualitative feedback.

Users would tell us when the agent "felt smart" vs. when it felt robotic. That subjective feedback guided our improvements more than any quantitative metric.

What worked (and what didn't)

What scaled beautifully

The memory architecture. Once we got the memory system right, agents could handle complex workflows that would confuse even experienced human agents. They stopped asking repetitive questions and each interaction felt like a continuation of the relationship.

Observability infrastructure. Being able to trace agent decisions and understand context at each step made debugging straightforward. We could spot patterns and fix edge cases quickly.

What needed iteration

Prompt complexity. Our early prompts were overly complex. We learned that focused, constrained prompts work better than exhaustive instructions.

Human handoff. The transition from AI to human agents needed work. We solved this by preserving context across the handoff so human agents could pick up where the AI left off.

The secret sauce

The stack scaled, but the secret sauce was pairing qualitative feedback with measurable outcomes—time to activation, retention, and the "felt sense" of the agent.

We discovered that users don't just want efficiency; they want to feel understood. An agent that remembers your preferences, acknowledges your expertise level, and adapts its communication style creates trust that pure performance metrics can't capture.

What's next

The playbook now guides how I approach every new AI-native workflow. Key principles:

Context first, speed second - A slightly slower agent that understands context beats a fast one that doesn't
Memory is infrastructure - Treat it like a database, not an afterthought
Observability from day one - You can't improve what you can't see
Human feedback loops - Metrics matter, but humans tell you when something feels right

The next frontier? Multi-agent workflows where different specialists collaborate on complex tasks. But that's a story for another post.