What enterprise AI got right, and wrong, in 2024

The year is closing on enterprise AI. I have been inside enough engagements this year to have a working view of where the field got it right and where it did not. I am writing this for the executive who is planning their 2025 budget and trying to distinguish the enduring patterns from the cycle's noise.

What enterprise AI got right

The shift from notebooks to production systems was real. A year ago, most of what I saw in client environments was experimental. This year, the centre of gravity moved. Teams stopped framing their work as research and started framing it as engineering. Evaluation harnesses became routine. Cost observability became a board-level concern. The people running these systems became recognisable as platform engineers, not as data scientists with extra responsibilities.

The use of structured outputs to constrain LLM behaviour also matured. A year ago, free-form generation was the default. By the back half of this year, anyone serious was generating into schemas, validating outputs, and treating the schema as the contract. That single architectural shift is responsible for most of the production reliability gains we have seen.

RAG, despite my reservations about how broadly it is being applied, has matured into a reasonable default for genuine retrieval problems. The teams that built RAG systems this year have learned how to do reranking, how to tune chunk strategies for their corpus, and how to measure retrieval quality independently of generation quality. That is real progress.

What enterprise AI got wrong

The agentic AI promise was oversold. Through most of this year, vendors and consultants were selling autonomous agents as a near-term reality. The systems that have shipped to production are mostly not autonomous in any meaningful sense. They are narrow tools with constrained tool use, and that is fine, but the language has been sloppy and the buyer expectations have been miscalibrated.

The healthcare adoption story has been weaker than the headlines suggest. There are real successful deployments, mostly in administrative workflows, prior authorisation, and revenue cycle management. Clinical decision support remains hard, regulated, and slow. The narrative has run ahead of the evidence.

The cost of inference has gone up faster than budgets have adjusted. Teams that built systems on the cheapest available model and assumed costs would fall have been caught out by the shift toward larger context windows and reasoning models. Cost observability is now a first-class concern, but most organisations are still rebuilding their forecasts.

Where the next year goes

A few things I expect to play out in 2025.

The line between training and serving will continue to blur. Fine-tuning will become more common as the cost falls. The default architecture for many enterprise use cases will move from prompt engineering against a frontier model to a smaller fine-tuned model, fronted by retrieval, with human review on edge cases. That is a saner architecture. It costs less. It is easier to govern.

Procurement will tighten. Buyers will move from buying capability to buying outcomes. The vendors who can produce case studies with named clients and audited results will pull ahead. The vendors with eloquent demos will lose ground. This is good for the field.

Regulation will arrive, in healthcare and financial services first, in fragmented form. The teams that have already built audit logs, model documentation, and incident response will treat the new requirements as paperwork. The teams that have not will spend a quarter retrofitting controls into systems that were not designed for them.

What to budget for

If I were sitting in front of a budget exercise for 2025, I would weight three line items higher than my client averages have been this year.

Evaluation and observability infrastructure. The bill for not having this in 2024 has been visible. The teams that underinvested are paying it now, in the form of incidents and in the form of pilots that did not graduate.

Domain expertise embedded in the team. The single best predictor of which AI projects will ship is whether the team has a senior person who understands the domain at the same depth as a long-tenured operator. This is not a hire you can defer to year three.

Operational headroom for the systems already in production. The maintenance cost of an LLM-backed system is non-trivial, and most 2024 budgets did not account for it. Year two costs will be higher than year one for many teams. Plan for it.

The year ahead is not the year of an AI revolution. It is the year of consolidation, where the organisations that learned the engineering lessons of 2024 pull ahead, and the ones that spent the year on demos start to fall behind. That is a slower story than the headlines wanted, and a more honest one.