The agent narrative has run hot for the better part of a year. Vendors have pitched autonomous agents that can run businesses, file taxes, and orchestrate teams. Most of these demos do not survive contact with a real workflow.
There is, however, a category where I think agentic patterns are genuinely useful. It is narrower than the marketing suggests, and the architecture that works is not what the demos show.
What agents are actually good at
The use case is bounded multi-step retrieval and synthesis over heterogeneous internal sources. Concretely, that means tasks where a person would otherwise look in three or four systems, read what they find, and write a summary or fill a form.
Examples I have shipped or seen ship.
Prior authorisation triage in a healthcare back office, where the agent reads the referral, pulls the patient's prior diagnoses from the EHR, checks the payer's published criteria, and drafts the case for human review. The human still approves. The agent saves twenty minutes per case and shifts the work from data gathering to judgement.
Vendor onboarding intake in a procurement function, where the agent reads the questionnaire response, checks the answers against internal policy, and flags clauses for legal. The human still negotiates. The agent saves the back-and-forth on items that are clearly out of policy.
Investigator support in claims fraud, where the agent assembles a case file from policy admin, claims history, and external sources, and writes a structured summary. The human still investigates. The agent compresses the assembly step.
What these have in common
The shared structure is worth naming, because it is what makes the pattern work.
The cost of a wrong answer is bounded by a human checkpoint. The agent is not the final actor. It produces a draft. A trained operator approves, edits, or rejects. This is the single most important property. It bounds the failure mode.
The cost of slow human work is high. These are workflows where the manual labour was already a bottleneck. Cycle time matters. The economic gain from compressing the data-gathering phase is real and measurable.
The data sources are stable and internal. The agent is operating inside systems your organisation already owns. The schemas are known. The auth model is known. The tool surface is finite. Agents fail when they are asked to operate over the open web with unbounded tool selection. They succeed when the tool list is short and the data shape is predictable.
The architecture that ships
The systems that work in production tend to look the same. They are not autonomous agents in the open-loop sense. They are constrained orchestrators with a small number of tools, a deterministic state machine for the high-level flow, and an LLM in a few specific roles.
The deterministic backbone matters. The LLM does not decide what step comes next in most cases. It executes a step. The next step is chosen by code that was written and reviewed by a human. This is much easier to test, to monitor, and to audit. It is also much cheaper.
A typical structure. The orchestrator is a state machine with five to ten well-defined states. Each state calls a tool, then calls an LLM with a tightly scoped prompt to interpret the result. The LLM produces structured output, validated by a schema. The state machine advances based on the validated output. There is one place where the LLM might choose between two next steps, and that choice is logged and reviewable.
What does not work
Open-ended planner-executor agents do not work in production yet for most enterprise tasks. They look impressive in demos, because the demo workflow is short and forgiving. They fall over in production because the workflows are longer, the failure modes are more numerous, and the cost of cascading errors is higher than the planner can compensate for.
Multi-agent systems where agents negotiate with each other to resolve a task do not work in production yet either, with rare exceptions. The negotiation overhead is high. The error surface is high. The interpretability is low. I have not seen one that beat a single well-prompted LLM with a deterministic orchestrator on a real workflow.
What to build
If you are building an agent system in 2025, my advice is short. Pick a workflow with a human checkpoint. Pick a workflow where the data sources are stable. Build a deterministic backbone. Use the LLM in narrow, validated roles. Measure cycle time and quality against the manual baseline.
The systems that emerge from that exercise are useful. They are also unfashionable, because they do not look like the demos. That is a feature, not a flaw.