Link copied to clipboard
Product InsightsApril 16, 20265 min read

Agent Forge: How We Built Self-Healing Agents

Our answer to the post-launch optimization problem

Claire Uhm
Claire Uhm
Agent Forge: How We Built Self-Healing Agents

The barrier to building AI agents has reduced dramatically over the past year. Teams can now describe what they want to build in plain language and have a working agent prototype within the same day. The shared thesis across the industry is that agent creation should be effortless, and we agree.

Lowering the barrier to building agents means organizations can discover new ways to improve their workflows without committing engineering resources upfront. The natural result is that more agents are making it to production, but the industry hasn't made the same progress on what happens after launch.

Agents need to be continuously monitored and improved to ensure their performance doesn’t degrade over time, and the ongoing work of optimizing agents is just as engineering-intensive as building them used to be.

Our solution to this is the Agent Forge, our agent development platform where AI agents handle that work end-to-end nearly autonomously.

Self-Healing Agents

Amigo's clinical agents are self-healing by design. The Agent Forge makes this possible by powering a separate team of AI coding agents that automatically detect issues, build fixes, and prepare updates for human review.

How Amigo Agent Engineers and coding agents collaborate in the Agent Forge
How Amigo Agent Engineers and coding agents collaborate in the Agent Forge

The process begins with one of our agent engineers describing the changes they want to make, which prompts a coding agent to pull the current agent configuration, make the edits, run simulations against thousands of AI-simulated patient personas, check for regressions, and prepare the change for human review. Once that process is complete, the loop closes with our engineer approving or rejecting the changes, ensuring that humans always have the final say.

This allows Amigo agents to continuously get better faster, with every change validated through simulation before it ever reaches a patient.

Testing Beyond Pre-Defined Scenarios

Before an Amigo agent talks to a real patient, it goes through millions of simulated conversations designed to test the edges of its capabilities. Several other agent platforms on the market offer some form of testing, but the Agent Forge treats agent development as a verification problem. The difference is that testing checks whether an agent can handle a set of pre-defined scenarios, whereas verification systematically finds scenarios that nobody anticipated.

Standard testing covers pre-defined scenarios, whereas verification systematically discovers new conversation paths
Standard testing covers pre-defined scenarios, whereas verification systematically discovers new conversation paths

The Agent Forge's simulation engine automatically discovers conversation paths the agent has never encountered to deepen coverage in areas where quality is weakest. It seeks out failure modes by running every scenario against simulated patients with different temperaments, because an agent that performs well with a calm and articulate patient may respond differently when that patient is anxious, skeptical, confused, or frustrated. The goal is to prove the agent can handle the full range of real-world interactions.

Measurement That Resists Gaming

Most agent platforms optimize toward a single metric, such as CSAT (customer satisfaction score), task completion, or resolution rate. The problem with this is well understood. When a measure is treated as a target, it stops being a good measure.

For example, an agent optimized for CSAT learns to become agreeable rather than accurate, and an agent optimized for resolution rate learns to rush through conversations before gathering sufficient context. Metrics may appear to improve on the surface, but the patient experience suffers.

Rather than optimizing for any single number, the Agent Forge measures agent performance across multiple dimensions simultaneously, and an interaction has to perform well across all of them to count as a success. Improvement in accuracy at the cost of empathy is a failure. So is an increase in resolution speed that leaves patients feeling confused.

The Agent Forge measures agent performance across multiple dimensions simultaneously
The Agent Forge measures agent performance across multiple dimensions simultaneously

Catching Drift Before It Compounds

Agents degrade over time as underlying language models get updated or shifting patient populations and seasonal patterns change the mix of incoming questions. An agent that was verified at launch may begin behaving differently, often in subtle ways that go unnoticed until a pattern of failures has already built up.

The Agent Forge continuously monitors for this kind of drift and is able to distinguish between two different types. Performance drift is when the agent's outputs begin to change, whereas dimensional drift is when the problem space itself has changed and the original agent is no longer able to cover it effectively. Each requires a different response, and the Agent Forge is able to initiate a structured cycle to understand what changed, determine the right fix, validate it through simulation, and deploy it with human approval.

The Partnership Model

Many agent platforms are designed to abstract customers away from complexity. “Upload your SOPs, describe what you want, and the platform handles the rest.” This black box approach is good enough for most industries, but in healthcare, it’s important for the clinical team to stay involved.

The people closest to the clinical workflows are the ones who understand what a good interaction looks like for their specific population and protocols. The Agent Forge is built for that kind of shared ownership, where the healthcare organization contributes their expertise on what good care looks like, and Amigo turns that input into an agent that delivers it reliably.

The Agent Forge is the interface between those two responsibilities, and as customer teams familiarize themselves with the platform, they can even begin driving agent development themselves.

Get Started with Amigo

If you're a healthcare organization evaluating clinical agents, it’s important to understand what happens after launch. Your agents will need to keep up as edge cases accumulate and conditions change. It’s why we built the Agent Forge, and we'd love to show you how it works.

Book a demo to see Amigo agents in action.

Ready to Build Healthcare AI That Works?