The State of Agent Engineering Report Overview
Check out the current state AI agent engineering the accessible way: demystifying the jargon, and seeking supporting evidence.

Image by Editor
# Introduction
LangChain, one of today’s leading frameworks for building and orchestrating artificial intelligence (AI) applications based on large language models (LLMs) and agent engineering, recently released the State of Agent Engineering report, in which 1,300 professionals of diverse roles and business backgrounds were surveyed to uncover the current state of this notable AI trend.
This article selects some top picks and insights from the report and elaborates on them in a tone accessible to a wider audience, uncovering some of the key terms and jargon related to AI agents. You can also find more about the key concepts behind AI agents in this related article.
Before focusing on the facts, figures, and supporting evidence for each of our top three handpicked insights, we provide some key terms and definitions to know, explained concisely:
# Large Enterprises Outpace Startups in Production
The key concepts to know:
- Agent: An AI system that, unlike standard chat-based applications that reactively respond to user interactions, is capable of making decisions and taking actions by itself. In their most widely used context today, agents use an LLM as their "brain," fueling decision-making on which steps to take next — for instance, querying a database, sending an email, or performing a web search — in order to complete a goal.
- Production (environment): While this is a basic concept in software engineering, it might sound unfamiliar to readers of other backgrounds. Being "in production" means a software system is live, and real users, customers, or employees are using it to conduct some work or action. It is basically what comes after a prototype or proof of concept (PoC): a test version of the software that has been run in a controlled environment to identify and fix possible issues.
The key facts in the report:
- While there is a common "red tape" misconception that larger companies are slower to adopt new technology, what data figures show unveil something different: they are leading the charge in AI agent deployment, with 67% of organizations with over 10,000 employees having put agent-based applications in production and only 50% of smaller organizations with under 100 employees doing so.
- Reasons for the above point may include the cost of building reliable agent solutions, with a significant infrastructure investment needed.
Similar evidence can be found in Deloitte's 2026 State of AI in the Enterprise and McKinsey's State of AI in 2025 reports.
# The Observability vs. Evaluation Gap
The key concepts to know:
- Observability: AI models, especially advanced ones, are often seen as opaque "black boxes" with unpredictable outcomes. Observability is the ability to inspect and record what the AI "thinks" and how it leads to decisions or outcomes.
- Tracing: A specific aspect of observability, consisting of recording the journey taken by an AI agent step by step — i.e., its reasoning path.
- Offline Evaluation: This consists of running through a test dataset with known "correct" answers to measure how accurately and effectively an AI agent (or other AI system) performs.
The key facts in the report:
- An astounding 89% of respondents from all backgrounds have implemented an observability mechanism, although only 52.4% are conducting offline evaluations, which reveals a notable discrepancy between how teams monitor AI agents and how rigorously they test their performance.
- This signals a "ship and watch" mentality, in which engineering teams give priority to debugging errors after they occur rather than preventing them before deployment into production. Fixing "broken robots" rather than ensuring they work properly before leaving the "factory" may incur undesired consequences and costs.
Similar evidence can be found in Giskard's LLM observability vs. evaluation article.
# Cost is No Longer the Main Bottleneck: Quality Is
The key concepts to know:
- Hallucinations: When an AI model like an LLM confidently generates false or nonsensical information as if it were true, it is said to be hallucinating. This is a dangerous problem when AI agents get into the loop because the problem is not only about saying something wrong but about potentially doing something wrong — e.g., booking a flight based on inaccurate or wrong retrieved facts.
- Latency: This refers to the speed or delay between a user asking a question and receiving a response provided by an agent, with a "thinking" or process logic in between, often involving the use of tools. This adds to the extra time involved compared to standalone LLMs or chatbots.
The key facts in the report:
- The cost of deploying AI agents is no longer a critical concern according to respondents, 32% of whom mention quality as their top barrier to adoption and deployment.
- Quality in this context refers to accuracy, consistency, and avoidance of hallucinations.
- Meanwhile, there is an interesting catch: the second most critical barrier is different depending on company size, with small startups citing latency and enterprises with over 2,000 employees pointing at security and compliance.
Similar supporting evidence can be found in the previously cited Barriers to AI Adoption report by Deloitte, while nuanced evidence about top enterprise blockers can be further analyzed in this Medium article.
Iván Palomares Carrascosa is a leader, writer, speaker, and adviser in AI, machine learning, deep learning & LLMs. He trains and guides others in harnessing AI in the real world.