More Tools Made AI Worse: The Engineering Fix Now Reshaping Agentic Systems

Something strange happened in late 2025. Engineers at Anthropic noticed that their AI agents were choking on their own capabilities. The more tools they connected, the worse their systems performed. A typical setup linking five common enterprise services (GitHub, Slack, Sentry, Grafana, and Splunk) consumed roughly 55,000 tokens just in tool definitions before the agent had even read a single user request. One internal deployment devoured 134,000 tokens on tool descriptions alone, leaving the model precious little room for actual reasoning. It was the software equivalent of filling a filing cabinet with instruction manuals and leaving no space for the files themselves.

The irony was exquisite. The Model Context Protocol, Anthropic's open standard for connecting AI agents to external systems, had succeeded beyond anyone's expectations. Launched in November 2024, MCP had grown from an internal experiment into the dominant integration standard for agentic AI, with over 10,000 active public MCP servers by late 2025 and adoption by ChatGPT, Cursor, Gemini, Microsoft Copilot, and Visual Studio Code. Official SDK downloads exceeded 97 million per month across Python and TypeScript. But this success created a paradox: the more tools agents could access, the less effective they became at using any of them.

The team's response, detailed in Anthropic's engineering blog post on advanced tool use, was not to limit tool access but to fundamentally rethink how agents discover and interact with their capabilities. The result was a trio of features that, working together, reduced context consumption by up to 85 per cent while simultaneously improving accuracy. And the design patterns they established are now reshaping how the entire industry thinks about building production agentic systems.

When Every Tool Costs You Tokens

To understand why the tool scaling problem became acute in 2025, you need to appreciate the economics of context windows. Every tool definition an agent loads carries a token cost. A modestly complex tool with a name, description, and parameter schema might consume 200 tokens. That seems trivial until you connect to a GitHub MCP server with 35 tools (roughly 26,000 tokens), a Slack server with 11 tools (21,000 tokens), and a handful of monitoring services. Suddenly, you have burned through tens of thousands of tokens before the conversation even begins.

Bin Wu, the primary author of Anthropic's advanced tool use engineering blog post and a former Airbnb engineer who joined Anthropic to work on AI safety, described the problem in stark terms. The traditional approach to tool management, loading all definitions upfront and passing them to the model, simply does not scale when developers are connecting agents to dozens or hundreds of MCP servers. Anthropic's internal testing revealed that tool selection accuracy degrades significantly once you exceed 30 to 50 available tools. The model gets overwhelmed by options, like a diner handed a 200-page menu when they just want breakfast.

This degradation is not merely anecdotal. Research on large language model tool calling has consistently demonstrated a negative correlation between tool library size and selection accuracy. The phenomenon has multiple causes. Context window saturation leaves less room for reasoning as tool definitions consume more space. The well-documented “lost in the middle” effect means models recall information at the beginning and end of their context windows more reliably than content buried in the middle, causing optimal tools to be overlooked when they appear amidst dozens of alternatives. And larger context windows alone do not solve the problem, because the core attention and selection accuracy issues persist regardless of how much space is available.

This problem has two distinct dimensions. The first is what engineers call “context bloat”: tool definitions consuming the finite token budget that the model needs for reasoning, user instructions, and conversation history. The second is “context pollution”: intermediate results from tool calls flooding the context window with data the model does not actually need. A two-hour sales meeting transcript routed through a workflow might mean processing an additional 50,000 tokens of audio transcription, even when the agent only needs a three-sentence summary.

As Adam Jones and Conor Kelly detailed in Anthropic's companion post on code execution with MCP, these two inefficiencies compound each other in production environments. An agent connected to thousands of tools must process hundreds of thousands of tokens in definitions before it even reads the user's request, and then each tool invocation potentially dumps thousands more tokens of intermediate results back into the context window. The practical ceiling is not the model's intelligence. It is the model's context budget.

The financial implications are equally pressing. Token usage translates directly into API costs. An enterprise running thousands of agent interactions daily can see its compute bills balloon when every request begins with 100,000 tokens of overhead. Latency suffers too: more input tokens mean longer processing times, which means slower responses, which means frustrated users and abandoned workflows. Before MCP, developers faced what Anthropic described as an “N by M” integration problem: ten AI applications and one hundred tools could require up to a thousand different integrations. MCP solved that problem elegantly, reducing it to a single protocol implementation on each side. But it introduced a new one: the protocol worked so well that developers connected everything, and the resulting token cost became unsustainable.

The Discovery Revolution

Anthropic's first intervention was the Tool Search Tool, a meta-capability that lets agents discover tools on demand rather than loading everything upfront. The concept is deceptively simple. Instead of passing all tool definitions to the model at the start of every conversation, developers mark tools with a defer_loading: true parameter. These deferred tools are not loaded into the model's context initially. The agent sees only the Tool Search Tool itself, plus a small set of frequently used tools that remain always-loaded. When the agent encounters a task requiring a specific capability, it searches for relevant tools, loads only the three to five it actually needs, and proceeds.

The token savings are dramatic. That five-server scenario consuming 55,000 tokens in definitions? With the Tool Search Tool, it drops to roughly 8,700 tokens, preserving 95 per cent of the context window. Across implementations, Anthropic documented an 85 per cent reduction in token usage while maintaining access to the full tool library. The system supports catalogues of up to 10,000 tools, returning three to five of the most relevant per search query.

But the real surprise was accuracy. Conventional wisdom suggested that reducing the number of visible tools would degrade performance, forcing the model to take an extra step to find what it needs. The opposite turned out to be true. By surfacing a focused set of relevant tools on demand, tool search actually improved selection accuracy, particularly with large tool libraries. Internal testing on MCP evaluation benchmarks showed Claude Opus 4 jumping from 49 per cent to 74 per cent accuracy, a 25 percentage-point improvement. Claude Opus 4.5 improved from 79.5 per cent to 88.1 per cent. The mechanism was straightforward: fewer options meant less confusion, and dynamically selected tools were more likely to be relevant to the actual task.

The Tool Search Tool supports multiple search strategies, each suited to different deployment needs. The regex-based variant, designated tool_search_tool_regex_20251119, uses Python's re.search() syntax and works well for keyword matching across tool names and descriptions. It supports exact matches, flexible patterns using wildcards, and case-insensitive searches with a maximum query length of 200 characters. The BM25-based variant, tool_search_tool_bm25_20251119, accepts natural language queries instead, using term-frequency ranking for more nuanced discovery. Both variants search tool names, descriptions, argument names, and argument descriptions. Custom embedding-based search offers a third option, enabling semantic matching that finds tools by meaning rather than exact terminology. Developers can implement whichever strategy suits their deployment, or combine them.

There is also a caching advantage that was not immediately obvious. Because deferred tools are not included in the initial prompt, the rest of the prompt remains stable across requests. This makes prompt caching significantly more effective, since the cacheable portion of the prompt does not change every time the tool set shifts. For high-volume deployments, this secondary optimisation can compound the primary token savings substantially.

Cloudflare arrived at a strikingly similar conclusion through independent research, publishing their findings under the banner of “Code Mode” in September 2025, roughly two months before Anthropic's November announcement. As Cloudflare's engineering team observed, with just two tools, search() and execute(), their server could provide access to the entire Cloudflare API over MCP while consuming only around 1,000 tokens. When new products were added, the same search and execute code paths discovered and called them automatically, with no new tool definitions and no new MCP servers required. The generated JavaScript runs in a secure, isolated V8 Worker sandbox with external network access blocked by default, and each execution receives its own Worker instance.

The convergence was striking. Two major technology companies, working independently, identified the same fundamental problem and arrived at architecturally similar solutions within weeks of each other, as noted in the Cloudflare Code Mode blog post. Cloudflare published on 26 September 2025; Anthropic followed on 4 November 2025. The posts reference each other, but these were clearly parallel discoveries driven by the same pressures: agents were scaling up, tool counts were exploding, and the old approach broke at this scale.

Writing Code Instead of Making Calls

The second major innovation was Programmatic Tool Calling, which addresses the context pollution problem rather than context bloat. Traditional tool calling works through a sequential loop: the agent requests a tool, the API returns the result, the result enters the model's context, and the agent decides what to do next. For simple workflows involving two or three tools, this is fine. For complex orchestration spanning 20 or more tool invocations, it becomes catastrophically expensive.

Consider a practical scenario: checking budget compliance across 20 team members. In the traditional approach, the agent calls a tool to retrieve each team member's spending, waits for the result, processes it in context, and calls the next tool. That is 20 round trips through the model, 20 sets of intermediate results flooding the context window, and 19 additional inference passes that each add latency and cost. Anthropic measured one such workflow consuming 43,588 tokens across all those sequential invocations.

Programmatic Tool Calling flips this model. Instead of requesting tools one at a time, the agent writes a Python script that orchestrates the entire workflow. The script runs in a sandboxed Code Execution environment, pausing when it needs results from external tools. When tool results return via the API, they are processed by the script rather than consumed by the model. The script handles loops, conditionals, error handling, and data filtering, and only the final aggregated output reaches the model's context window. In Anthropic's budget compliance example, the same workflow dropped from 43,588 to 27,297 tokens, a 37 per cent reduction on that single task. But the savings compound dramatically with complexity: one implementation documented by Adam Jones and Conor Kelly reduced token usage from 150,000 tokens to 2,000, a 98.7 per cent reduction.

The insight beneath this approach has a certain elegance. Large language models have been trained on billions of lines of real code. They are fluent in Python, JavaScript, and TypeScript. But JSON tool-calling schemas are synthetic constructs that barely appear in training data. Asking a model to orchestrate complex workflows through individual JSON function calls is like asking a concert pianist to play a symphony by pressing one key at a time and waiting for approval between each note. Programmatic Tool Calling lets the model compose the entire piece. Cloudflare's engineering team articulated the same observation independently: models are fluent in real programming languages but stutter when asked to produce function-call JSON, because they have seen millions of lines of actual code during training but only contrived tool-calling examples.

This idea did not appear in a vacuum. Joel Pobar's LLMVM project had been exploring code-based tool orchestration since before it was fashionable, allowing language models to interleave natural language and code rather than relying on traditional tool calling APIs. The project's design philosophy, that letting models write code generally results in significantly better task deconstruction and execution, prefigured the approach that both Anthropic and Cloudflare would later formalise. LLMVM uses a “continuation passing style” execution model, where queries result in natural language interleaved with code rather than a rigid sequence of code generation followed by natural language interpretation.

Anthropic's implementation requires tools to opt in to programmatic calling through an allowed_callers parameter, specifically allowed_callers: ["code_execution_20250825"]. This ensures that sensitive operations can be restricted to direct model invocation with user approval. The sandboxed execution environment provides resource limits and monitoring. Intermediate results stay within the execution environment by default, which also carries privacy benefits: the MCP client can tokenise personally identifiable information automatically, allowing real data to flow between systems while preventing the model from accessing raw PII.

On 17 February 2026, Anthropic moved Programmatic Tool Calling to general availability with the release of Claude Sonnet 4.6, signalling that the feature had graduated from experimental curiosity to production-ready infrastructure. Alongside this, web search and fetch tools gained automatic code-based filtering, cutting input tokens by 24 per cent while boosting BrowserComp accuracy from 33 per cent to 46 per cent. Joe Binder, VP of Product at GitHub, noted that Claude Sonnet 4.6 was “already excelling at complex code fixes, especially when searching across large codebases is essential.” The broader community followed suit: Block's Goose Agent added “code mode” MCP support, LiteLLM added native support across providers, and multiple open-source projects adopted the pattern.

Teaching By Example

The third feature in Anthropic's advanced tool use suite is Tool Use Examples, which addresses a subtler problem than context bloat or pollution: parameter specification errors. Even when an agent correctly identifies the right tool and calls it efficiently, it can still fail by passing malformed or incorrect parameters. JSON Schema definitions tell the model what parameters are available and their types, but they do not convey the conventions, correlations, and formatting expectations that distinguish a correct invocation from a technically valid but functionally broken one.

Consider a calendar scheduling API that accepts a date parameter. The schema specifies that the parameter is a string, but does it expect “2025-11-15”, “15/11/2025”, “November 15, 2025”, or “15 Nov 2025”? The schema might even specify a pattern, but more complex relationships between parameters, such as an “enddate” that must be after “startdate” or a “timezone” parameter that changes the interpretation of datetime values, remain invisible in the formal specification.

Tool Use Examples solve this by providing concrete usage patterns alongside schema definitions. Developers include an input_examples array with one to five examples demonstrating proper parameter formatting and usage patterns. These examples can show minimal parameter usage (just the required fields), partial parameter combinations (common optional parameter groupings), and full parameter specifications (every available option), giving the model a practical understanding of how the tool should actually be called. Anthropic recommends between one and five examples per tool, with each example addressing a different usage pattern.

The impact on accuracy is substantial. Anthropic's internal testing showed accuracy on complex parameter handling improving from 72 per cent to 90 per cent with the addition of Tool Use Examples. As the Setec Research analysis of these features noted, the improvement is particularly pronounced for APIs with ambiguous parameter relationships, where the schema alone does not capture the implicit rules governing which parameter combinations are valid. The feature addresses format conventions that JSON Schema cannot express, nested structure patterns that require demonstration rather than description, and the implicit correlations between optional parameters that experienced developers understand intuitively but struggle to formalise.

This is not a novel pedagogical insight. It is the same principle that makes code documentation more useful when it includes examples alongside API reference descriptions. Developers have long understood that showing is more effective than telling. Tool Use Examples apply this principle to the model-tool interface, giving the agent worked examples rather than abstract specifications.

The three features are designed to work together as a complementary system. Tool Search Tool reduces upfront context consumption by loading tools on demand. Programmatic Tool Calling reduces runtime context pollution by keeping intermediate results out of the model's context. Tool Use Examples reduce errors by teaching the model how to use tools correctly through demonstration rather than description alone. Together, they address the full lifecycle of what Anthropic calls “context pollution in MCP-connected agents,” as described in their advanced tool use documentation.

Scaling MCP Without Breaking the Bank

The practical challenge for enterprises is not adopting any single optimisation technique. It is managing the operational complexity of hundreds or thousands of MCP-connected tools while maintaining the progressive disclosure principles that keep agents efficient. This requires architectural thinking beyond individual feature adoption.

The recommended pattern follows what might be called layered tool management. At the first layer, a small set of three to five frequently used tools remains always-loaded with defer_loading: false. These are the tools the agent will need in nearly every interaction: perhaps a file search tool, a messaging tool, and a general-purpose retrieval tool. At the second layer, entire MCP servers can be deferred with a default_config that sets defer_loading: true across all their tools, with selective exceptions for high-use capabilities within those servers. The example Anthropic provides is a Google Drive MCP server where all tools are deferred except search_files, which remains loaded because it is the most commonly needed entry point.

This progressive disclosure architecture mirrors a principle well-established in user interface design: show users what they need now, and make everything else discoverable. The difference is that here, the “user” is an AI agent, and the cost of poor disclosure is not confusion but wasted tokens and degraded performance.

For organisations operating at genuine scale, with dozens of MCP servers and hundreds of tools, the filesystem-based discovery pattern described in Anthropic's code execution with MCP post offers an alternative architecture. Rather than registering every tool with the API upfront, tools are presented as code files on a filesystem, organised into directories by server. A TypeScript file tree might include paths like servers/google-drive/getDocument.ts and servers/salesforce/updateRecord.ts. The agent explores the filesystem to find relevant tool definitions, reading them on demand. A search_tools utility allows the agent to query for relevant definitions with configurable detail levels: name only, name plus description, or full schemas with parameters. This approach scales elegantly because adding new tools means adding new files, not modifying configuration or redeploying infrastructure.

The governance dimension is equally important. In December 2025, Anthropic donated MCP to the Agentic AI Foundation, a directed fund under the Linux Foundation co-founded by Anthropic, Block, and OpenAI with support from Google, Microsoft, Amazon Web Services, Cloudflare, and Bloomberg. Mike Krieger, Chief Product Officer at Anthropic, explained that MCP had started as an internal project and had become “the industry standard for connecting AI systems to data and tools.” The donation was designed to ensure the protocol “stays open, neutral, and community-driven as it becomes critical infrastructure for AI.” Jim Zemlin, executive director of the Linux Foundation, framed the goal as avoiding a future of “closed wall” proprietary stacks where tool connections, agent behaviour, and orchestration are locked behind a handful of platforms. Platinum members of the new foundation include Amazon, Anthropic, Block, Bloomberg, Cloudflare, Google, Microsoft, and OpenAI, with Gold members including Cisco, Datadog, Docker, IBM, JetBrains, and Oracle among others. This institutional backing means enterprises can invest in MCP-based architectures with reasonable confidence in the standard's longevity and neutrality.

Measuring What Matters

Building efficient agentic systems is one problem. Knowing whether they actually work is another entirely. The evaluation methodologies for tool-using agents are still maturing, but several frameworks have emerged from both Anthropic's internal testing and broader industry practice.

The first and most obvious metric is token efficiency: how many tokens does the agent consume per task, and how does this change as the tool library grows? Anthropic's benchmarks provide useful baselines. A five-server MCP deployment consuming 55,000 tokens in definitions with traditional loading should drop to roughly 8,700 tokens with Tool Search, a reduction that should hold proportionally as the tool count increases. Programmatic Tool Calling should yield additional reductions of 37 per cent or more on multi-step workflows, with higher savings on more complex orchestrations.

But token efficiency alone is insufficient. The more meaningful measure is task accuracy, specifically the rate at which agents select the correct tool, invoke it with proper parameters, and produce the intended outcome. Anthropic's MCP evaluation benchmarks provide one framework for this, measuring tool selection accuracy across varying library sizes. The jumps from 49 to 74 per cent (Opus 4) and from 79.5 to 88.1 per cent (Opus 4.5) with Tool Search enabled demonstrate that efficiency and accuracy can improve simultaneously, rather than trading off against each other.

For Programmatic Tool Calling, Anthropic measured knowledge retrieval accuracy improving from 25.6 per cent to 28.5 per cent, and GIA (General Instruction Adherence) benchmarks rising from 46.5 per cent to 51.2 per cent. These gains are more modest than the tool selection improvements, reflecting the fact that code-based orchestration primarily addresses efficiency rather than capability. But in production systems, small accuracy improvements compound across thousands of daily interactions.

A rigorous evaluation framework should track at least four dimensions. First, context efficiency: tokens consumed per task, broken down by tool definitions, intermediate results, and model reasoning. Second, tool selection precision: the rate at which the agent identifies the correct tool for a given subtask, measured against a labelled test set of tasks and expected tool selections. Third, parameter accuracy: the rate at which tool invocations include correct, complete, and properly formatted parameters. And fourth, end-to-end task completion: whether the overall workflow produces the correct final output, regardless of intermediate steps.

The bottleneck identification process follows naturally from these metrics. If context efficiency is poor but tool selection is accurate, the problem is in loading strategy, and Tool Search is the appropriate intervention. If tool selection degrades with library size, the search implementation needs refinement, perhaps moving from regex-based to embedding-based retrieval. If parameter errors dominate, Tool Use Examples should be expanded. If end-to-end completion lags despite good individual metrics, the orchestration logic, whether sequential or programmatic, likely needs restructuring.

Latency deserves its own evaluation track. Programmatic Tool Calling eliminates the multiple round trips inherent in sequential tool invocation, which should reduce end-to-end latency for complex workflows. But it introduces the overhead of code execution environments. Measuring wall-clock time per task, alongside token counts, reveals whether the computational overhead of sandboxed execution outweighs the savings from fewer inference passes. In Anthropic's testing, a workflow requiring 19 or more sequential inference passes collapsed into a single programmatic execution, a latency reduction that far exceeded any overhead from the sandbox.

Researchers have also proposed broader reliability frameworks for evaluating agentic systems. A 2025 study outlined twelve concrete metrics decomposing agent reliability along four key dimensions: consistency, robustness, predictability, and safety. The findings were sobering. Evaluating 14 agentic models across two complementary benchmarks, the researchers found that recent capability gains had yielded only small improvements in reliability, suggesting that making agents more capable does not automatically make them more dependable. This gap between capability and reliability underscores the importance of dedicated evaluation infrastructure that measures not just whether agents can do things, but whether they do them consistently and safely.

Design Patterns for the Production Frontier

The engineering patterns emerging from these developments suggest a maturation of agentic system architecture. Several design principles have crystallised from both Anthropic's work and the broader community's experience.

The first is what Anthropic calls the “layered approach”: address the highest bottleneck first. If tool definitions consume most of your token budget, start with Tool Search. If intermediate results dominate, implement Programmatic Tool Calling. If parameter errors cause most failures, add Tool Use Examples. This triage prevents premature optimisation and ensures that engineering effort targets the actual constraint.

The second pattern is parallel execution through code orchestration. When multiple tool calls are independent, they should execute concurrently rather than sequentially. Anthropic's documentation references asyncio.gather() for independent operations within programmatic tool calls. This pattern does not reduce token consumption, but it dramatically reduces latency for workflows involving multiple independent data retrievals.

The third pattern involves explicit return format documentation. When tools are called programmatically, the agent's code needs to parse tool outputs reliably. Anthropic recommends explicitly specifying tool output structures in tool descriptions, so the code the model writes can accurately reference fields and formats in the returned data. Without this, the model may generate code that assumes incorrect output structures, leading to runtime failures in the sandbox.

The fourth pattern addresses security and privacy boundaries. Programmatic Tool Calling introduces a new trust surface: the agent is now writing and executing code, not just making predefined function calls. The allowed_callers parameter provides opt-in control, ensuring that sensitive tools (those that modify data, access credentials, or perform irreversible actions) can be restricted to direct model invocation with explicit user approval. Cloudflare's approach adds another layer: bindings that provide pre-authorised client interfaces, ensuring that AI-generated code cannot possibly leak API keys because the keys never enter the execution environment. The binding provides an already-authorised client interface to the MCP server, and all calls made on it pass through the agent supervisor first, which holds the access tokens and injects them into outbound requests.

The fifth pattern concerns state persistence across operations. As described in Anthropic's MCP code execution documentation, agents can maintain filesystem state across operations, enabling resumption of interrupted workflows and the development of reusable functions as “skills” that persist between sessions. This transforms agents from stateless request processors into stateful systems capable of learning and adaptation within their operational context.

For enterprises evaluating these patterns, the critical implementation question is infrastructure. Code execution demands secure execution environments with appropriate sandboxing, resource limits, and monitoring. These add operational overhead compared to direct tool calls. Anthropic's managed implementation handles container management, code execution, and secure tool invocation communication, but organisations with strict data residency or compliance requirements may need to build or adapt their own execution environments. Cloudflare's approach, using Durable Objects as stateful micro-servers with their own SQL databases and WebSocket connections, offers one model for self-hosted execution, deploying once and scaling across a global network to tens of millions of instances.

The broader trajectory is unmistakable. The agent ecosystem is moving away from the “give the model everything and let it figure it out” approach that characterised early tool-using agents. In its place, a more disciplined architecture is emerging: one that treats context as a scarce resource, applies progressive disclosure to manage complexity, and uses code execution to keep intermediate processing out of the model's reasoning space. This is not merely an efficiency optimisation. It is a fundamental shift in how we think about the boundary between what the model does and what the surrounding system does.

As the MCP ecosystem continues to expand under the governance of the Agentic AI Foundation, and as tool counts scale from hundreds to thousands, the organisations that thrive will be those that master this boundary. They will build agents that know how to find the right tool without seeing every tool, that orchestrate complex workflows through code rather than conversation, and that learn from examples rather than struggling with abstract schemas. The 85 per cent context reduction is not the end state. It is the beginning of an entirely new way of building intelligent systems.

References and Sources

  1. Wu, Bin. “Introducing Advanced Tool Use on the Claude Developer Platform.” Anthropic Engineering Blog, November 2025. https://www.anthropic.com/engineering/advanced-tool-use

  2. Jones, Adam and Kelly, Conor. “Code Execution with MCP.” Anthropic Engineering Blog, November 2025. https://www.anthropic.com/engineering/code-execution-with-mcp

  3. Wolenitz, Alon. “Advanced Tool Use in Claude API: Three New Features That Change A Lot.” Setec Research Claude Blog, November 2025. https://claude-blog.setec.rs/blog/advanced-tool-use-claude-api

  4. Cloudflare Engineering. “Code Mode: Give Agents an Entire API in 1,000 Tokens.” Cloudflare Blog, September 2025. https://blog.cloudflare.com/code-mode-mcp/

  5. Anthropic. “Introducing the Model Context Protocol.” Anthropic News, November 2024. https://www.anthropic.com/news/model-context-protocol

  6. Anthropic. “Donating the Model Context Protocol and Establishing the Agentic AI Foundation.” Anthropic News, December 2025. https://www.anthropic.com/news/donating-the-model-context-protocol-and-establishing-of-the-agentic-ai-foundation

  7. Linux Foundation. “Linux Foundation Announces the Formation of the Agentic AI Foundation (AAIF).” Linux Foundation Press Release, December 2025. https://www.linuxfoundation.org/press/linux-foundation-announces-the-formation-of-the-agentic-ai-foundation

  8. Anthropic. “Introducing Claude Sonnet 4.6.” Anthropic News, February 2026. https://www.anthropic.com/news/claude-sonnet-4-6

  9. Pobar, Joel. “LLMVM: LLM Python Agentic Runtime Prototype.” GitHub Repository. https://github.com/9600dev/llmvm

  10. Model Context Protocol Specification, Version 2025-11-25. https://modelcontextprotocol.io/specification/2025-11-25

  11. Pento AI. “A Year of MCP: From Internal Experiment to Industry Standard.” Pento Blog, 2025. https://www.pento.ai/blog/a-year-of-mcp-2025-review

  12. Claude API Documentation. “Tool Search Tool.” Anthropic Developer Platform. https://platform.claude.com/docs/en/agents-and-tools/tool-use/tool-search-tool

  13. Claude API Documentation. “Programmatic Tool Calling.” Anthropic Developer Platform. https://platform.claude.com/docs/en/agents-and-tools/tool-use/programmatic-tool-calling


Tim Green

Tim Green UK-based Systems Theorist & Independent Technology Writer

Tim explores the intersections of artificial intelligence, decentralised cognition, and posthuman ethics. His work, published at smarterarticles.co.uk, challenges dominant narratives of technological progress while proposing interdisciplinary frameworks for collective intelligence and digital stewardship.

His writing has been featured on Ground News and shared by independent researchers across both academic and technological communities.

ORCID: 0009-0002-0156-9795 Email: tim@smarterarticles.co.uk

Discuss...