The barrier to building AI agents has reduced dramatically over the past year. You can now describe what you want to build in plain language and have a working agent prototype within the same day. The shared thesis across the industry is that agent creation should be effortless, and we agree.

Lowering the barrier to building agents means organizations can discover new ways to improve their workflows without committing engineering resources upfront. The natural result is that more agents are making it to production, but the industry hasn't made the same progress on what happens after launch.

Agents need to be continuously monitored and improved to ensure their performance doesn’t degrade over time, and the ongoing work of optimizing agents is just as engineering-intensive as building them used to be.

Our solution to this is the Agent Forge, our agent development platform where AI agents handle that work end-to-end nearly autonomously.

Self-Improving Agents

Amigo's clinical agents are self-improving by design. The Agent Forge makes this possible by powering a separate team of AI coding agents that automatically detect issues, build fixes, and prepare updates for human review.

How Agent Engineers and coding agents in the Agent Forge collaborate to improve Amigo’s clinical agents

The process begins with one of our agent engineers describing the changes they want to make, which prompts coding agents to pull the clinical agent configuration, make the edits, validate that they work by running simulations against thousands of AI-simulated patient personas, check for regressions, and prepare the updates for human review. Once that process is complete, our engineers approve or reject the changes, ensuring humans always have the final say. The loop closes with the coding agents implementing any approved changes to the Amigo clinical agent.

This cycle allows our agents to continuously learn and improve, with built-in checks and balances to prevent unwanted changes from reaching patients.

Testing Beyond Pre-Defined Scenarios

Before an Amigo agent talks to a real patient, it goes through millions of simulated conversations designed to test the edges of its capabilities. Several other agent platforms on the market offer some form of testing, but the Agent Forge treats agent development as a verification problem. The difference is that testing checks whether an agent can handle a set of pre-defined scenarios, whereas verification systematically finds scenarios outside of the problem set.

Standard testing covers pre-defined scenarios, whereas verification systematically discovers new conversation paths

The Agent Forge's simulation engine automatically discovers conversation paths the agent has never encountered to deepen coverage in areas where quality is weakest. It deliberately seeks out failure modes by running every scenario against simulated patients with different temperaments, because an agent that performs well with a calm and articulate patient may respond differently when that patient is anxious, skeptical, confused, or frustrated. The goal is to prove the agent can handle the full range of real-world interactions.

Measurement That Resists Gaming

Most agent platforms optimize toward a single metric, such as CSAT (customer satisfaction score), task completion, or resolution rate. The problem with this is well understood. When a measure is treated as a target, it stops being a good measure.

For example, an agent optimized for CSAT learns to become agreeable rather than accurate, and an agent optimized for resolution rate learns to rush through conversations before gathering sufficient context. Metrics may appear to improve on the surface, but the patient experience suffers.

Rather than optimizing for any single number, the Agent Forge measures agent performance across multiple dimensions simultaneously, and an interaction has to perform well across all of them to count as a success. Improvement in accuracy at the cost of empathy is a failure. So is an increase in resolution speed if it leaves patients feeling confused.

Catching Drift Before It Compounds

Agents degrade over time as underlying language models get updated or shifting patient populations and seasonal patterns change the mix of incoming questions. An agent that was verified at launch may begin behaving differently, often in subtle ways that go unnoticed until a pattern of failures has already built up.

The Agent Forge continuously monitors for this kind of drift and is able to distinguish between two different types. Performance drift is when the agent's outputs begin to change, whereas dimensional drift is when the problem space itself has changed and the original agent is no longer able to cover it effectively. Each requires a different response, and the Agent Forge is able to initiate a structured cycle to understand what changed, determine the right fix, validate it through simulation, and deploy it with a human in the loop.

The Partnership Model

Many agent platforms are designed to abstract customers away from complexity. “Upload your SOPs, describe what you want, and the platform handles the rest.” This black box approach is good enough for most industries, but in healthcare, it’s important for the clinical team to stay involved.

The people closest to the clinical workflows are the ones who understand what a good interaction looks like for their specific population and protocols. The Agent Forge is built for that kind of shared ownership, where the healthcare organization contributes their expertise on what good care looks like, and Amigo turns that input into an agent that delivers it reliably.

The Agent Forge is the interface between those two responsibilities, and as customer teams familiarize themselves with the platform, they can even begin driving agent development themselves.

Get Started with Amigo

If you're a healthcare leader evaluating clinical agents, it’s important to understand what happens after launch. Your agents will need to keep up as edge cases accumulate and conditions change. It’s why we built the Agent Forge, and we'd love to show you how it works.

Book a demo to see Amigo agents in action.

How We Build Self-Improving Agents