When Coding Agents Forget: The Hidden Cost of AI Context Degradation

The promise was elegant in its simplicity: AI agents that could work on complex software projects for hours, reasoning through problems, writing code, and iterating toward solutions without constant human supervision. The reality, as thousands of development teams have discovered, involves a phenomenon that researchers have begun calling “context rot,” a gradual degradation of performance that occurs as these agents accumulate more information than they can effectively process. And the strategies emerging to combat this problem reveal a fascinating tension between computational efficiency and code quality that is reshaping how organisations think about AI-assisted development.
In December 2025, researchers at JetBrains presented findings at the NeurIPS Deep Learning for Code workshop that challenged prevailing assumptions about how to manage this problem. Their paper, “The Complexity Trap,” demonstrated that sophisticated LLM-based summarisation techniques, the approach favoured by leading AI coding tools like Cursor and OpenHands, performed no better than a far simpler strategy: observation masking. This technique simply replaces older tool outputs with placeholder text indicating that content has been omitted for brevity, while preserving the agent's reasoning and action history in full.
The implications are significant. A simple environment observation masking strategy halves cost relative to running an agent without any context management, while matching or slightly exceeding the task completion rate of complex LLM summarisation. The researchers found that combining both approaches yielded additional cost reductions of 7% compared to observation masking alone and 11% compared to summarisation alone. These findings suggest that the industry's rush toward ever more sophisticated context compression may be solving the wrong problem.
The Anatomy of Forgetting
To understand why AI coding agents struggle with extended tasks, you need to grasp how context windows function. Every interaction, every file read, every test result, and every debugging session accumulates in what functions as the agent's working memory. Modern frontier models can process 200,000 tokens or more, with some supporting context windows exceeding one million tokens. Google's Gemini models offer input windows large enough to analyse entire books or multi-file repositories in a single session.
But raw capacity tells only part of the story. Research from Chroma Labs has verified a troubling pattern: models that perform brilliantly on focused inputs show consistent performance degradation when processing full, lengthy contexts. In February 2025, researchers at Adobe tested models on what they called a more difficult variant of the needle-in-a-haystack test. The challenge required not just locating a fact buried in lengthy text, but making an inference based on that fact. Leading models achieved over 90% accuracy on short prompts. In 32,000-token prompts, accuracy dropped dramatically.
The Chroma research revealed several counterintuitive findings. Models perform worse when the surrounding context preserves a logical flow of ideas. Shuffled text, with its lack of coherent structure, consistently outperformed logically organised content across all 18 tested models. The researchers found that Claude models exhibited the lowest hallucination rates and tended to abstain when uncertain. GPT models showed the highest hallucination rates, often generating confident but incorrect responses when distracting information was present. Qwen models degraded steadily but held up better in larger versions. Gemini stood out for starting to make errors earlier with wild variations, but Claude models decayed the slowest overall.
No model is immune to this decay. The difference is merely how quickly and dramatically each degrades.
Two Philosophies of Context Management
The industry has coalesced around two primary approaches to managing this degradation, each embodying fundamentally different philosophies about what information matters and how to preserve it.
Observation masking targets the environment observations specifically, the outputs from tools like file readers, test runners, and search functions, while preserving the agent's reasoning and action history in full. The JetBrains research notes that observation tokens make up around 84% of an average SWE-agent turn. This approach recognises that the most verbose and often redundant content comes not from the agent's own thinking but from the systems it interacts with. By replacing older tool outputs with simple placeholders like “Previous 8 lines omitted for brevity,” teams can dramatically reduce context consumption without losing the thread of what the agent was trying to accomplish.
LLM summarisation takes a more comprehensive approach, compressing entire conversation histories into condensed representations. This theoretically allows infinite scaling of turns without an infinitely scaling context, as the summarisation can be repeated whenever limits approach. The yellow-framed square in architectural diagrams represents the summary of previous turns, a distillation that attempts to preserve essential information while discarding redundancy.
The trade-offs between these approaches illuminate deeper tensions in AI system design. Summarisation adds computational overhead, with summarisation calls accounting for up to 7% of total inference cost for strong models according to JetBrains' analysis. More concerning, summaries can mask failure signals, causing agents to persist in unproductive loops because the compressed history no longer contains the specific error messages or dead-end approaches that would otherwise signal the need to change direction.
Factory AI's research on context compression evaluation identified specific failure modes that emerge when information is lost during compression. Agents forget which files they have modified. They lose track of what approaches they have already tried. They cannot recall the reasoning behind past decisions. They forget the original error messages or technical details that motivated particular solutions. Without tracking artefacts, an agent might re-read files it already examined, make conflicting edits, or lose track of test results. A casual conversation can afford to forget earlier topics. A coding agent that forgets it modified auth.controller.ts will produce inconsistent work.
The Recursive Summarisation Problem
Sourcegraph's Amp coding agent recently retired its compaction feature in favour of a new approach called “handoff.” The change came after the team observed what happens when summarisation becomes recursive, when the system creates summaries of summaries as sessions extend.
Among several findings, the Codex team had noted that its automated compaction system, which summarised a session and restarted it whenever the model's context window neared its limit, was contributing to a gradual decline in performance over time. As sessions accumulated more compaction events, accuracy fell, and recursive summaries began to distort earlier reasoning.
Handoff works differently. Rather than automatically compressing everything when limits approach, it allows developers to specify a goal for the next task, whereupon the system analyses the current thread and extracts relevant information into a fresh context. This replaces the cycle of compression and re-summarisation with a cleaner break between phases of work, carrying forward only what still matters for the next stage.
This architectural shift reflects a broader recognition that naive optimisation for compression ratio, minimising tokens per request, often increases total tokens per task. When agents lose critical context, they must re-fetch files, re-read documentation, and re-explore previously rejected approaches. Factory AI's evaluation found that one provider achieved 99.3% compression but scored lower on quality metrics. The lost details required costly re-fetching that exceeded token savings.
The Technical Debt Accelerator
The context management problem intersects with a broader quality crisis in AI-assisted development. GitClear's second-annual AI Copilot Code Quality research analysed 211 million changed lines of code from 2020 to 2024 across a combined dataset of anonymised private repositories and 25 of the largest open-source projects. The findings paint a troubling picture.
GitClear reported an eightfold increase in code blocks containing five or more duplicated lines compared to just two years earlier. This points to a surge in copy-paste practices, with duplication becoming ten times more common. The percentage of code changes classified as “moved” or “refactored,” the signature of code reuse, declined dramatically from 24.1% in 2020 to just 9.5% in 2024. Meanwhile, lines classified as copy-pasted or cloned rose from 8.3% to 12.3% in the same period.
Code churn, which measures code that is added and then quickly modified or deleted, is climbing steadily, projected to hit nearly 7% by 2025. This metric signals instability and rework. Bill Harding, GitClear's CEO and founder, explains the dynamic: “AI has this overwhelming tendency to not understand what the existing conventions are within a repository. And so it is very likely to come up with its own slightly different version of how to solve a problem.”
API evangelist Kin Lane offered a stark assessment: “I don't think I have ever seen so much technical debt being created in such a short period of time during my 35-year career in technology.” This observation captures the scale of the challenge. AI coding assistants excel at adding code quickly but lack the contextual awareness to reuse existing solutions or maintain architectural consistency.
The Google 2025 DORA Report found that a 90% increase in AI adoption was associated with an estimated 9% climb in bug rates, a 91% increase in code review time, and a 154% increase in pull request size. Despite perceived productivity gains, the majority of developers actually spend more time debugging AI-generated code than they did before adopting these tools.
Anthropic's Systematic Approach
In September 2025, Anthropic announced new context management capabilities that represent perhaps the most systematic approach to this problem. The introduction of context editing and memory tools addressed both the immediate challenge of context exhaustion and the longer-term problem of maintaining knowledge across sessions.
Context editing automatically clears stale tool calls and results from within the context window when approaching token limits. As agents execute tasks and accumulate tool results, context editing removes obsolete content while preserving the conversation flow. In a 100-turn web search evaluation, context editing enabled agents to complete workflows that would otherwise fail due to context exhaustion, while reducing token consumption by 84%.
The memory tool enables Claude to store and consult information outside the context window through a file-based system. The agent can create, read, update, and delete files in a dedicated memory directory stored in the user's infrastructure, persisting across conversations. This allows agents to build knowledge bases over time, maintain project state across sessions, and reference previous learnings without keeping everything in active context.
Anthropic's internal benchmarks highlight the impact. Using both the memory tool and context editing together delivers a 39% boost in agent performance on complex, multi-step tasks. Even using context editing alone delivers a notable 29% improvement.
The company's engineering guidance emphasises that context must be treated as a finite resource with diminishing marginal returns. Like humans, who have limited working memory capacity, LLMs have an “attention budget” that they draw on when parsing large volumes of context. Every new token introduced depletes this budget by some amount, increasing the need to carefully curate the tokens available to the model.
Extended Thinking and Deliberate Reasoning
Beyond context management, Anthropic has introduced extended thinking capabilities that enable more sophisticated reasoning for complex tasks. Extended thinking gives Claude enhanced reasoning capabilities by allowing it to output its internal reasoning process before delivering a final answer. The budget_tokens parameter determines the maximum number of tokens the model can use for this internal reasoning.
This capability enhances performance significantly. Anthropic reports a 54% improvement in complex coding challenges when extended thinking is enabled. In general, accuracy on mathematical and analytical problems improves logarithmically with the number of “thinking tokens” allowed.
For agentic workflows, Claude 4 models support interleaved thinking, which enables the model to reason between tool calls and make more sophisticated decisions after receiving tool results. This allows for more complex agentic interactions where the model can reason about the results of a tool call before deciding what to do next, chain multiple tool calls with reasoning steps in between, and make more nuanced decisions based on intermediate results.
The recommendation for developers is to use specific phrases to trigger additional computation time. “Think” triggers basic extended thinking. “Think hard,” “think harder,” and “ultrathink” map to increasing levels of thinking budget. These modes give the model additional time to evaluate alternatives more thoroughly, reducing the need for iterative correction that would otherwise consume context window space.
The Rise of Sub-Agent Architectures
Beyond compression and editing, a more fundamental architectural pattern has emerged for managing context across extended tasks: the sub-agent or multi-agent architecture. Rather than one agent attempting to maintain state across an entire project, specialised sub-agents handle focused tasks with clean context windows. The main agent coordinates with a high-level plan while sub-agents perform deep technical work. Each sub-agent might explore extensively, using tens of thousands of tokens or more, but returns only a condensed, distilled summary of its work.
Gartner reported a staggering 1,445% surge in multi-agent system enquiries from Q1 2024 to Q2 2025, signalling a shift in how systems are designed. Rather than deploying one large LLM to handle everything, leading organisations are implementing orchestrators that coordinate specialist agents. A researcher agent gathers information. A coder agent implements solutions. An analyst agent validates results. This pattern mirrors how human teams operate, with each agent optimised for specific capabilities rather than being a generalist.
Context engineering becomes critical in these architectures. Multi-agent systems fail when context becomes polluted. If every sub-agent shares the same context, teams pay a massive computational penalty and confuse the model with irrelevant details. The recommended approach treats shared context as an expensive dependency to be minimised. For discrete tasks with clear inputs and outputs, a fresh sub-agent spins up with its own context, receiving only the specific instruction. Full memory and context history are shared only when the sub-agent must understand the entire trajectory of the problem.
Google's Agent Development Kit documentation distinguishes between global context (the ultimate goal, user preferences, and project history) and local context (the specific sub-task at hand). Effective engineering ensures that a specialised agent, such as a code reviewer, receives only a distilled contextual packet relevant to its task, rather than being burdened with irrelevant data from earlier phases.
Sub-agents get their own fresh context, completely separate from the main conversation. Their work does not bloat the primary context. When finished, they return a summary. This isolation is why sub-agents help with long sessions. Claude Code can spawn sub-agents, which allows it to split up tasks. Teams can also create custom sub-agents to have more control, allowing for context management and prompt shortcuts.
When Compression Causes Agents to Forget Critical Details
The specific failure modes that emerge when context compression loses information have direct implications for code quality and system reliability. Factory AI's research designed a probe-based evaluation that directly measures functional quality after compression. The approach is straightforward: after compression, ask the agent questions that require remembering specific details from the truncated history. If the compression preserved the right information, the agent answers correctly.
All tested methods struggled particularly with artefact tracking, scoring only 2.19 to 2.45 out of 5.0 on this dimension. When agents forget which files they have modified, they re-read previously examined code and make conflicting edits. Technical detail degradation varied more widely, with Factory's approach scoring 4.04 on accuracy while OpenAI's achieved only 3.43. Agents that lose file paths, error codes, and function names become unable to continue work effectively.
Context drift presents another challenge. Compression approaches that regenerate summaries from scratch lose task state across cycles. Approaches that anchor iterative updates preserve context better by making incremental modifications rather than full regeneration.
The October 2025 Acon framework from Chinese researchers attempts to address these challenges through dynamic condensation of environment observations and interaction histories. Rather than handcrafting prompts for compression, Acon introduces a guideline optimisation pipeline that refines compressor prompts via failure analysis, ensuring that critical environment-specific and task-relevant information is retained. The approach is gradient-free, requiring no parameter updates, making it usable with closed-source or production models.
The Productivity Paradox
These technical challenges intersect with a broader paradox that has emerged in AI-assisted development. Research reveals AI coding assistants increase developer output but not company productivity. This disconnect sits at the heart of the productivity paradox being discussed across the industry.
The researchers at METR conducted what may be the most rigorous study of AI coding tool impact on experienced developers. They recruited 16 experienced developers from large open-source repositories averaging over 22,000 stars and one million lines of code, projects that developers had contributed to for multiple years. Each developer provided lists of real issues, totalling 246 tasks, that would be valuable to the repository: bug fixes, features, and refactors that would normally be part of their regular work.
The finding shocked the industry. When developers were randomly assigned to use AI tools, they took 19% longer to complete tasks than when working without them. Before the study, developers had predicted AI would speed them up by 24%. After experiencing the actual slowdown, they still believed it had helped, estimating a 20% improvement. The objective measurement showed the opposite.
The researchers found that developers accepted less than 44% of AI generations. This relatively low acceptance rate resulted in wasted time, as developers often had to review, test, and modify code, only to reject it in the end. Even when suggestions were accepted, developers reported spending considerable time reviewing and editing the code to meet their high standards.
According to Stack Overflow's 2025 Developer Survey, only 16.3% of developers said AI made them more productive to a great extent. The largest group, 41.4%, said it had little or no effect. Telemetry from over 10,000 developers confirms this pattern: AI adoption consistently skews toward newer hires who use these tools to navigate unfamiliar code, while more experienced engineers remain sceptical.
The pattern becomes clearer when examining developer experience levels. AI can get you 70% of the way, but the last 30% is the hard part. For juniors, 70% feels magical. For seniors, the last 30% is often slower than writing it clean from the start.
The Human Oversight Imperative
The Ox Security report, titled “Army of Juniors: The AI Code Security Crisis,” identified ten architecture and security anti-patterns commonly found in AI-generated code. According to Veracode's 2025 GenAI Code Security Report, which analysed code produced by over 100 LLMs across 80 real-world coding tasks, AI introduces security vulnerabilities in 45% of cases.
Some programming languages proved especially problematic. Java had the highest failure rate, with LLM-generated code introducing security flaws more than 70% of the time. Python, C#, and JavaScript followed with failure rates between 38 and 45%. LLMs also struggled with specific vulnerability types. 86% of code samples failed to defend against cross-site scripting, and 88% were vulnerable to log injection attacks.
This limitation means that even perfectly managed context cannot substitute for human architectural oversight. The Qodo State of AI Code Quality report found that missing context was the top issue developers face, reported by 65% during refactoring and approximately 60% during test generation and code review. Only 3.8% of developers report experiencing both low hallucination rates and high confidence in shipping AI-generated code without human review.
Nearly one-third of all improvement requests in Qodo's survey were about making AI tools more aware of the codebase, team norms, and project structure. Hallucinations and quality issues often stem from poor contextual awareness. When AI suggestions ignore team patterns, architecture, or naming conventions, developers end up rewriting or rejecting the code, even if it is technically correct.
Architectural Decision-Making Remains Human Territory
AI coding agents are very good at getting to correct code, but they perform poorly at making correct design and architecture decisions independently. If allowed to proceed without oversight, they will write correct code but accrue technical debt very quickly.
The European Union's AI Act, with high-risk provisions taking effect in August 2026 and penalties reaching 35 million euros or 7% of global revenue, demands documented governance. AI governance committees have become standard in mid-to-large enterprises, with structured intake processes covering security, privacy, legal compliance, and model risk.
The OWASP GenAI Security Project released the Top 10 for Agentic Applications in December 2025, reflecting input from over 100 security researchers, industry practitioners, and technology providers. Agentic systems introduce new failure modes, including tool misuse, prompt injection, and data leakage. OWASP 2025 includes a specific vulnerability criterion addressing the risk when developers download and use components from untrusted sources. This takes on new meaning when AI coding assistants, used by 91% of development teams according to JetBrains' 2025 survey, are recommending packages based on training data that is three to six months old at minimum.
BCG's research on human oversight emphasises that generative AI presents risks, but human review is often undermined by automation bias, escalation roadblocks, and evaluations based on intuition rather than guidelines. Oversight works when organisations integrate it into product design rather than appending it at launch, and pair it with other components like testing and evaluation.
Emerging Patterns for Production Systems
The architectural patterns emerging to address these challenges share several common elements. First, they acknowledge that human oversight is not optional but integral to the development workflow. Second, they implement tiered review processes that route different types of changes to different levels of scrutiny. Third, they maintain explicit documentation that persists outside the agent's context window.
The recommended approach involves creating a context directory containing specialised documents: a Project Brief for core goals and scope, Product Context for user experience workflows and business logic, System Patterns for architecture decisions and component relationships, Tech Context for the technology stack and dependencies, and Progress Tracking for working features and known issues.
This Memory Bank approach addresses the fundamental limitation that AI assistants lose track of architectural decisions, coding patterns, and overall project structure as project complexity increases. By maintaining explicit documentation that gets fed into every AI interaction, teams can maintain consistency even as AI generates new code.
The human role in this workflow resembles a navigator in pair programming. The navigator directs overall development strategy, makes architectural decisions, and reviews AI-generated code. The AI functions as the driver, generating code implementations and suggesting refactoring opportunities. The critical insight is treating AI as a junior developer beside you: capable of producing drafts, boilerplate, and solid algorithms, but lacking the deep context of your project.
Research from METR shows AI task duration doubling every seven months, from one-hour tasks in early 2025 to eight-hour workstreams by late 2026. This trajectory intensifies both the context management challenge and the need for architectural oversight. When an eight-hour autonomous workstream fails at hour seven, the system needs graceful degradation, not catastrophic collapse.
The Hierarchy of Memory
Sophisticated context engineering now implements hierarchical memory systems that mirror human cognitive architecture. Working memory holds the last N turns of conversation verbatim. Episodic memory stores summaries of distinct past events or sessions. Semantic memory extracts facts and preferences from conversations and stores them separately for retrieval when needed.
Hierarchical summarisation compresses older conversation segments while preserving essential information. Rather than discarding old context entirely, systems generate progressively more compact summaries as information ages. Recent exchanges remain verbatim while older content gets compressed into summary form. This approach maintains conversational continuity without consuming excessive context.
Claude Code demonstrates this approach with its auto-compact feature. When a conversation nears the context limit, the system compresses hundreds of turns into a concise summary, preserving task-critical details while freeing space for new reasoning. Since version 2.0.64, compacting is instant, eliminating the previous waiting time. When auto-compact triggers, Claude Code analyses the conversation to identify key information worth preserving, creates a concise summary of previous interactions, decisions, and code changes, compacts the conversation by replacing old messages with the summary, and continues seamlessly with the preserved context.
However, the feature is not without challenges. Engineers have built in a “completion buffer” giving tasks room to finish before compaction, eliminating disruptive mid-operation interruptions. The working hypothesis is that Claude Code triggers auto-compact much earlier than before, potentially around 64-75% context usage versus the historical 90% threshold.
The emerging best practice involves using sub-agents to verify details or investigate particular questions, especially early in a conversation or task. This preserves context availability without much downside in terms of lost efficiency. Each sub-agent gets its own context window, preventing any single session from approaching limits while allowing deep investigation of specific problems.
Balancing Efficiency and Quality
The trade-offs between computational efficiency and code quality are not simply technical decisions but reflect deeper values about the role of AI in software development. Organisations that optimise primarily for token reduction may find themselves paying the cost in increased debugging time, architectural inconsistency, and security vulnerabilities. Those that invest in comprehensive context preservation may face higher computational costs but achieve more reliable outcomes.
Google's 2024 DORA report found that while AI adoption increased individual output by 21% more tasks completed and 98% more pull requests merged, organisational delivery metrics remained flat. More concerning, AI adoption correlated with a 7.2% reduction in delivery stability. The 2025 DORA report confirms this pattern persists. Speed without stability is accelerated chaos.
Forecasts predict that on this trajectory, 75% of technology leaders will face moderate to severe technical debt by 2026. The State of Software Delivery 2025 report found that despite perceived productivity gains, the majority of developers actually spend more time debugging AI-generated code. This structural debt arises because LLMs prioritise local functional correctness over global architectural coherence and long-term maintainability.
Professional developers do not vibe code. Instead, they carefully control the agents through planning and supervision. They seek a productivity boost while still valuing software quality attributes. They plan before implementing and validate all agentic outputs. They find agents suitable for well-described, straightforward tasks but not complex tasks.
The Discipline That Enables Speed
The paradox of AI-assisted development is that achieving genuine productivity gains requires slowing down in specific ways. Establishing guardrails, maintaining context documentation, implementing architectural review, and measuring beyond velocity all represent investments that reduce immediate output. Yet without these investments, the apparent gains from AI acceleration prove illusory as technical debt accumulates, architectural coherence degrades, and debugging time compounds.
The organisations succeeding with AI coding assistance share common characteristics. They maintain rigorous code review regardless of code origin. They invest in automated testing proportional to development velocity. They track quality metrics alongside throughput metrics. They train developers to evaluate AI suggestions critically rather than accepting them reflexively.
Gartner predicts that 40% of enterprise applications will embed AI agents by the end of 2026, up from less than 5% in 2025. Industry analysts project the agentic AI market will surge from 7.8 billion dollars today to over 52 billion dollars by 2030. This trajectory makes the questions of context management and human oversight not merely technical concerns but strategic imperatives.
The shift happening is fundamentally different from previous developments. Teams moved from autocomplete to conversation in 2024, from conversation to collaboration in 2025. Now they are moving from collaboration to delegation. But delegation without oversight is abdication. The agents that will succeed are those designed with human judgment as an integral component, not an afterthought.
The tools are genuinely powerful. The question is whether teams have the discipline to wield them sustainably, maintaining the context engineering and architectural oversight that transform raw capability into reliable production systems. The future belongs not to the organisations that generate the most AI-assisted code, but to those that understand when to trust the agent, when to question it, and how to ensure that forgetting does not become the defining characteristic of their development process.
References and Sources
JetBrains Research, “The Complexity Trap: Simple Observation Masking Is as Efficient as LLM Summarization for Agent Context Management,” NeurIPS 2025 Deep Learning for Code Workshop (December 2025). https://arxiv.org/abs/2508.21433
JetBrains Research Blog, “Cutting Through the Noise: Smarter Context Management for LLM-Powered Agents” (December 2025). https://blog.jetbrains.com/research/2025/12/efficient-context-management/
Chroma Research, “Context Rot: How Increasing Input Tokens Impacts LLM Performance” (2025). https://research.trychroma.com/context-rot
Anthropic, “Managing context on the Claude Developer Platform” (September 2025). https://www.anthropic.com/news/context-management
Anthropic, “Effective context engineering for AI agents” (2025). https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents
Anthropic, “Building with extended thinking” (2025). https://docs.claude.com/en/docs/build-with-claude/extended-thinking
Factory AI, “Evaluating Context Compression for AI Agents” (2025). https://factory.ai/news/evaluating-compression
Amp (Sourcegraph), “Handoff (No More Compaction)” (2025). https://ampcode.com/news/handoff
METR, “Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity” (July 2025). https://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/
Qodo, “State of AI Code Quality Report” (2025). https://www.qodo.ai/reports/state-of-ai-code-quality/
Veracode, “GenAI Code Security Report” (2025). https://www.veracode.com/blog/genai-code-security-report/
Ox Security, “Army of Juniors: The AI Code Security Crisis” (2025). Referenced via InfoQ.
OWASP GenAI Security Project, “Top 10 Risks and Mitigations for Agentic AI Security” (December 2025). https://genai.owasp.org/2025/12/09/owasp-genai-security-project-releases-top-10-risks-and-mitigations-for-agentic-ai-security/
Google DORA, “State of DevOps Report” (2024, 2025). https://dora.dev/research/
GitClear, “AI Copilot Code Quality: 2025 Data Suggests 4x Growth in Code Clones” (2025). https://www.gitclear.com/ai_assistant_code_quality_2025_research
Gartner, Multi-agent system enquiry data (2024-2025). Referenced in multiple industry publications.
BCG, “You Won't Get GenAI Right if Human Oversight is Wrong” (2025). https://www.bcg.com/publications/2025/wont-get-gen-ai-right-if-human-oversight-wrong
JetBrains, “The State of Developer Ecosystem 2025” (2025). https://blog.jetbrains.com/research/2025/10/state-of-developer-ecosystem-2025/
Stack Overflow, “2025 Developer Survey” (2025). https://survey.stackoverflow.co/2025/
Google Developers Blog, “Architecting efficient context-aware multi-agent framework for production” (2025). https://developers.googleblog.com/architecting-efficient-context-aware-multi-agent-framework-for-production/
Faros AI, “Best AI Coding Agents for 2026” (2026). https://www.faros.ai/blog/best-ai-coding-agents-2026
Machine Learning Mastery, “7 Agentic AI Trends to Watch in 2026” (2026). https://machinelearningmastery.com/7-agentic-ai-trends-to-watch-in-2026/
Arxiv, “Acon: Optimizing Context Compression for Long-horizon LLM Agents” (October 2025). https://arxiv.org/html/2510.00615v1
ClaudeLog, “What is Claude Code Auto-Compact” (2025). https://claudelog.com/faqs/what-is-claude-code-auto-compact/

Tim Green UK-based Systems Theorist and Independent Technology Writer
Tim explores the intersections of artificial intelligence, decentralised cognition, and posthuman ethics. His work, published at smarterarticles.co.uk, challenges dominant narratives of technological progress while proposing interdisciplinary frameworks for collective intelligence and digital stewardship.
His writing has been featured on Ground News and shared by independent researchers across both academic and technological communities.
ORCID: 0009-0002-0156-9795 Email: tim@smarterarticles.co.uk








