The MCP Revolution and the Search for Stable AI Use Cases
A conversation with AI researcher Sebastian Wallkötter reveals insights on standardization, security challenges, and the fundamental question facing enterprise artificial intelligence adoption.

Image by Editor
# Introducing MCP
Standards succeed or fail based on adoption, not technical superiority. The Model Context Protocol (MCP) understood this from the start. Released by Anthropic in late 2024, MCP solved the straightforward problem of how artificial intelligence (AI) models should interact with external tools. The protocol's design was simple enough to encourage implementation, and its utility was clear enough to drive demand. Within months, MCP had triggered the network effects that turn a good idea into an industry standard. Yet as Sebastian Wallkötter, an AI researcher and data engineer, explains in a recent conversation, this swift adoption has surfaced critical questions about security, scalability, and whether AI agents are always the right solution.
Wallkötter brings a unique perspective to these discussions. He completed his PhD in human-robot interaction in 2022 at Uppsala University, focusing on how robots and humans can work together more naturally. Since then, he has transitioned into the commercial AI space, working on large language model (LLM) applications and agent systems. His background bridges the gap between academic research and practical implementation, providing valuable insight into both the technical capabilities and the real-world constraints of AI systems.
# Why MCP Won The Standards Race
The Model Context Protocol solved what appeared to be a straightforward problem: how to create a reusable way for AI models to access tools and services. Before MCP, every LLM provider and every tool creator had to build custom integrations. MCP provided a common language.
"MCP is really very much focused on tool calling," Wallkötter explains. "You have your agent or LLM or something, and that thing is supposed to interact with Google Docs or your calendar app or GitHub or something like that."
The protocol's success mirrors other platform standardization stories. Just as Facebook achieved critical mass when enough users joined to make the network valuable, MCP reached a tipping point where providers wanted to support it because users demanded it, and users wanted it because providers supported it. This network effect drove adoption across geographic boundaries, with no apparent regional preference between US and European implementations.
The speed of adoption caught many by surprise. Within months of its October 2024 release, major platforms had integrated MCP support. Wallkötter suspects the initial momentum came from developers recognizing practical value: "I suspect it was just some engineer going, 'Hey, this is a fun format. Let's roll with it.'" Wallkötter further explains the dynamic: "Once MCP gets big enough, all the providers support it. So why wouldn't you want to do an MCP server to just be compatible with all the models? And then reverse as well, everybody has an MCP server, so why don't you support it? Because then you get a lot of compatibility." The protocol went from an interesting technical specification to an industry standard faster than most observers expected.
# The Security Blind Spot
Rapid adoption, however, revealed significant gaps in the original specification. Wallkötter notes that developers quickly discovered a critical vulnerability: "The first version of the MCP didn't have any authentication in it at all. So anybody in the world could just go to any MCP server and just call it, run stuff, and that can obviously backfire."
The authentication challenge proves more complex than traditional web security models. MCP involves three parties: the user, the LLM provider (such as Anthropic or OpenAI), and the service provider (such as GitHub or Google Drive). Traditional web authentication handles two-party interactions well. A user authenticates with a service, and that relationship is straightforward. MCP requires simultaneous consideration of all three parties.
"You have the MCP server, you have the LLM provider, and then you have the user itself," Wallkötter explains. "Which part do you authenticate which thing? Because are you authenticating that it's Anthropic that communicates with GitHub? But it's the user there, right? So it's the user actually authenticating."
The situation becomes even more complex with autonomous agents. When a user instructs a travel planning agent to book a vacation, and that agent begins calling various MCP servers without direct user oversight, who bears responsibility for those actions? Is it the company that built the agent? The user who initiated the request? The question has technical, legal, and ethical dimensions that the industry is still working to resolve.
# The Prompt Injection Problem
Beyond authentication, MCP implementations face another security challenge that has no clear solution: prompt injection. This vulnerability allows malicious actors to hijack AI behavior by crafting inputs that override the system's intended instructions.
Wallkötter draws a parallel to an older web security issue. "It reminds me a bit of the old SQL injection days," he notes. In the early web, developers would concatenate user input directly into database queries, allowing attackers to insert malicious SQL commands. The solution involved separating the query structure from the data, using parameterized queries that treated user input as pure data rather than executable code.
"I suspect that the solution will be very similar to how we solved it for SQL databases," Wallkötter suggests. "You send the prompt itself first and then all the data you want to slot into the different pieces of the prompt separately, and then there is some system that sits there before the LLM that looks at the data and tries to figure out is there a prompt injection there."
Despite this potential approach, no widely adopted solution exists yet. LLM providers attempt to train models to prioritize system instructions over user input, but these safeguards remain imperfect. "There's always ways around that because there's no foolproof way to do it," Wallkötter acknowledges.
The prompt injection problem extends beyond security concerns into reliability. When an MCP server returns data that gets embedded into the LLM's context, that data can contain instructions that override intended behavior. An AI agent following a carefully designed workflow can be derailed by unexpected content in a response. Until this vulnerability is addressed, autonomous agents operating without human oversight carry inherent risks.
# The Tool Overload Trap
MCP's ease of use creates an unexpected problem. Because adding a new tool is straightforward, developers often accumulate dozens of MCP servers in their applications. This abundance degrades performance in measurable ways.
"I've seen a couple of examples where people were very enthusiastic about MCP servers and then ended up with 30, 40 servers with all the functions," Wallkötter observes. "Suddenly you have 40 or 50 percent of your context window from the start taken up by tool definitions."
Each tool requires a description that explains its purpose and parameters to the LLM. These descriptions consume tokens in the context window, the limited space where the model holds all relevant information. When tool definitions occupy half the available context, the model has less room for actual conversation history, retrieved documents, or other critical information. Performance suffers predictably.
Beyond context window constraints, too many tools create confusion for the model itself. Current generation LLMs struggle to distinguish between similar tools when presented with extensive options. "The general consensus on the internet at the moment is that 30-ish seems to be the magic number in practice," Wallkötter notes, describing the threshold beyond which model performance noticeably degrades.
This limitation has architectural implications. Should developers build one large agent with many capabilities, or multiple smaller agents with focused tool sets? The answer depends partly on context requirements. Wallkötter offers a memorable metric: "You get around 200,000 tokens in the context window for most decent agents these days. And that's roughly as much as Pride and Prejudice, the entire book."
This "Jane Austen metric" provides intuitive scale. If an agent needs extensive business context, formatting guidelines, project history, and other background information, that accumulated knowledge can quickly fill a substantial portion of the available space. Adding 30 tools on top of that context may push the system beyond effective operation.
The solution often involves strategic agent architecture. Rather than one universal agent, organizations might deploy specialized agents for distinct use cases: one for travel planning, another for email management, a third for calendar coordination. Each maintains a focused tool set and specific instructions, avoiding the complexity and confusion of an overstuffed general-purpose agent.
# When Not To Use AI
Wallkötter's robotics background provides an unexpected lens for evaluating AI implementations. His PhD research on humanoid robots revealed a persistent challenge: finding stable use cases where humanoid form factors provided genuine advantages over simpler alternatives.
"The thing with humanoid robots is that they're a bit like an unstable equilibrium," he explains, drawing on a physics concept. A pendulum balanced perfectly upright could theoretically remain standing indefinitely, but any minor disturbance causes it to fall. "If you slightly perturb that, if you don't get it perfect, it will immediately fall back down." Humanoid robots face similar challenges. While fascinating and capable of impressive demonstrations, they struggle to justify their complexity when simpler solutions exist.
"The second you start to actually really think about what can we do with this, you are immediately faced with this economic question of do you actually need the current configuration of humanoid that you start with?" Wallkötter asks. "You can take away the legs and put wheels instead. Wheels are much more stable, they're simpler, they're cheaper to build, they're more robust."
This thinking applies directly to current AI agent implementations. Wallkötter encountered an example recently: a sophisticated AI coding system that included an agent specifically designed to identify unreliable tests in a codebase.
"I asked, why do you have an agent and an AI system with an LLM that tries to figure out if a test is unreliable?" he recounts. "Can't you just call the test 10 times, see if it fails and passes at the same time? Because that's what an unreliable test is, right?"
The pattern repeats across the industry. Teams apply AI to problems that have simpler, more reliable, and cheaper solutions. The allure of using cutting-edge technology can obscure straightforward alternatives. An LLM-based solution might cost significant compute resources and still occasionally fail, while a deterministic approach could solve the problem instantly and reliably.
This observation extends beyond individual technical decisions to broader strategy questions. MCP's flexibility makes it easy to add AI capabilities to existing workflows. That ease of integration can lead to reflexive AI adoption without careful consideration of whether AI provides genuine value for a specific task.
"Is this really the way to go, or is it just AI is a cool thing, let's just throw it at everything?" Wallkötter asks. The question deserves serious consideration before committing resources to AI-powered solutions.
# The Job Market Paradox
The conversation revealed an unexpected perspective on AI's impact on employment. Wallkötter initially believed AI would augment rather than replace workers, following historical patterns with previous technological disruptions. Recent observations have complicated that view.
"I think I've actually been quite wrong about this," he admits, reflecting on his earlier predictions. When AI first gained mainstream attention, a common refrain emerged in the industry: "You're not going to be replaced with AI, you're going to be replaced with a person using AI." Wallkötter initially subscribed to this view, drawing parallels to historical technology adoption cycles.
"When the typewriter came out, people were criticizing that people that were trained to write with pen and ink were criticizing that, well, you're killing the spirit of writing, and it's just dead, and nobody's going to use a typewriter. It's just a soulless machine," he notes. "Look fast forward a couple decades. Everybody uses computers."
This pattern of initial resistance followed by universal adoption seemed to apply to AI as well. The key distinction lies in the type of work being automated and whether that work exists in a fixed or expandable pool. Software engineering illustrates the expandable category. "You can now, if before you got a ticket from your ticket system, you would program the solution, send the merge request, you would get the next ticket and repeat the cycle. That piece can now be done faster, so you can do more tickets," Wallkötter explains.
The time saved on maintenance work does not eliminate the need for engineers. Instead, it shifts how they allocate their time. "All the time that you save because you can now spend less time maintaining, you can now spend innovating," he observes. "So what happens is you get the shift of how much time you spend innovating, how much time you spend maintaining, and that pool of innovation grows."
Customer support presents an entirely different picture. "There's only so many customer cases that come in, and you don't really, most companies at least don't innovate in what they do for customer support," Wallkötter explains. "They want it solved, they want customers to figure out answers to their questions and they want to have a good experience talking to the company. But that's kind of where it ends."
The distinction is stark. In customer support, work volume is determined by incoming requests, not by team capacity. When AI can handle those requests effectively, the math becomes simple. "There you just only have work for one person when you had work for four people before."
This division between expandable and fixed workloads may determine which roles face displacement versus transformation. The pattern extends beyond these two examples. Any role where increased efficiency creates opportunities for additional valuable work appears more resilient. Any role where work volume is externally constrained and innovation is not a priority faces greater risk.
Wallkötter's revised perspective acknowledges a more complex reality than simple augmentation or replacement narratives suggest. The question is not whether AI replaces jobs or augments them, but rather which specific characteristics of a role determine its trajectory. The answer requires examining the nature of the work itself, the constraints on work volume, and whether efficiency gains translate to expanded opportunities or reduced headcount needs.
# The Path Forward
MCP's rapid adoption demonstrates the AI industry's hunger for standardization and interoperability. The protocol solved a real problem and did so with sufficient simplicity to encourage widespread implementation. Yet the challenges emerging from this adoption underscore the field's immaturity in critical areas.
Security concerns around authentication and prompt injection require fundamental solutions, not incremental patches. The industry needs to develop robust frameworks that can handle the unique three-party dynamics of AI agent interactions. Until those frameworks exist, enterprise deployment will carry significant risks.
The tool overload problem and the fundamental question of when to use AI both point to a need for greater discipline in system design. The capability to add tools easily should not translate to adding tools carelessly. Organizations should evaluate whether AI provides meaningful advantages over simpler alternatives before committing to complex agent architectures.
Wallkötter's perspective, informed by experience in both academic robotics and commercial AI development, emphasizes the importance of finding "stable use cases" rather than chasing technological capability for its own sake. The unstable equilibrium of humanoid robots offers a cautionary tale: impressive capabilities mean little without practical applications that justify their complexity and cost.
As MCP continues evolving, with Anthropic and the broader community addressing security, scalability, and usability concerns, the protocol will likely remain central to AI tooling. Its success or failure in solving these challenges will significantly influence how quickly AI agents move from experimental deployments to reliable business infrastructure.
The conversation ultimately returns to a simple but profound question: just because we can build something with AI, should we? The answer requires honest assessment of alternatives, careful consideration of costs and benefits, and resistance to the temptation to apply trendy technology to every problem. MCP provides powerful capabilities for connecting AI to the world. Using those capabilities wisely demands the same thoughtful engineering that created the protocol itself.
Rachel Kuznetsov has a Master's in Business Analytics and thrives on tackling complex data puzzles and searching for fresh challenges to take on. She's committed to making intricate data science concepts easier to understand and is exploring the various ways AI makes an impact on our lives. On her continuous quest to learn and grow, she documents her journey so others can learn alongside her. You can find her on LinkedIn.