The Ouroboros Machine: When AI Reviews Its Own Code

Somewhere inside the engineering departments of the world's largest technology companies, a peculiar feedback loop has taken hold. AI systems generate code. Other AI systems review that code. Human developers, increasingly sidelined from the details of what they are shipping, approve the results with a cursory glance, trusting that the machines have checked each other's work. It is a recursive dependency model that, on the surface, appears to represent the pinnacle of software engineering efficiency. Beneath that surface, it is something far more troubling: a system in which genuine comprehension of production software is quietly evaporating.
The numbers underscoring this shift are staggering. According to SonarSource's State of Code 2025 survey, 42% of committed code is now AI-generated or AI-assisted. GitHub Copilot generates an average of 46% of code written by its users, with Java developers reaching 61%. Microsoft has stated that 30% of its code is now written by AI. In March 2025, Y Combinator reported that 25% of startup companies in its Winter 2025 batch had codebases that were 95% AI-generated. By 2026, Gartner forecasts that up to 60% of new software code will be AI-generated. And yet, as a December 2025 analysis by CodeRabbit revealed, AI-generated code produces 1.7 times more defects than human-written code, with logic and correctness errors 75% more prevalent and security vulnerabilities up to 2.74 times higher. The enterprise world has normalised a practice that demonstrably increases the rate at which flawed software reaches production, whilst simultaneously deploying AI-powered tools to catch the very problems that AI introduced.
This is not merely a quality assurance challenge. It is a systemic architectural failure, one that demands urgent examination before organisations cross an invisible threshold from which recovery becomes extraordinarily expensive.
The Verification Gap Nobody Wants to Discuss
The fundamental mismatch between AI code generation and AI code review is not a matter of sophistication. It is a matter of category. AI code generators, whether GitHub Copilot, Cursor, or Claude Code, excel at producing syntactically correct, plausible-looking software. They are trained on billions of lines of existing code and have absorbed the statistical patterns of how functions are structured, how variables are named, and how common problems are solved. What they lack, fundamentally, is understanding. They do not know what the software is supposed to do in the context of a specific business, a specific user base, or a specific regulatory environment.
AI code review tools suffer from a mirror-image limitation. They can identify known vulnerability patterns, flag deviations from coding standards, and spot surface-level issues with impressive speed. What they cannot do reliably is reason about architectural intent, cross-service dependencies, or the subtle business logic that distinguishes a functioning application from a dangerously flawed one. Many tools are limited to changes visible within a single pull request and do not track downstream consumers or cross-service contract violations. Tools systematically fail to detect breaking changes across service boundaries in microservice architectures and SDK incompatibilities when shared libraries are updated.
Tenzai's December 2025 research laid this bare with uncomfortable precision. The firm tested identical prompts across five of the most prominent AI coding tools: Claude Code, OpenAI Codex, Cursor, Replit, and Devin. Across 15 test applications, they found 69 vulnerabilities, including six rated critical. The pattern was revealing: not a single exploitable SQL injection or cross-site scripting vulnerability was found. The AI tools had learned to avoid those well-documented pitfalls. Instead, the dominant failures were in business logic and authorisation: preventing negative pricing in e-commerce applications, enforcing user ownership checks, and validating that admin-only endpoints actually require admin access. Every tool tested introduced server-side request forgery vulnerabilities because determining which URLs are safe is inherently context-dependent.
What concerned Tenzai most was not what the AI implemented incorrectly; it was what the AI never attempted at all. “All the coding agents, across every test we performed, failed miserably when it came to security controls,” the researchers noted. “It wasn't that they implemented them incorrectly. In almost all cases, they didn't even try.”
This is the verification gap in its starkest form. AI code generators produce software that looks complete but is architecturally hollow in its security posture. AI code reviewers, operating on the same statistical pattern-matching principles, are well-equipped to catch the kinds of errors that AI generators have already learned to avoid, and poorly equipped to catch the kinds of errors that AI generators systematically introduce. The reviewer and the generator share the same blind spots.
The Trust Deficit and the Review Bottleneck
Sonar's January 2026 survey of over 1,100 developers globally quantified a striking paradox at the heart of enterprise AI adoption. Nearly all developers, 96%, expressed some degree of distrust in AI-generated code, yet only 48% consistently verified that code before committing it. The survey found that 38% of respondents said reviewing AI-generated code requires more effort than reviewing human-generated code. Meanwhile, 35% of developers reported accessing AI coding tools via personal accounts rather than work-sanctioned ones, creating a blind spot for security and compliance teams.
The downstream consequences of this trust deficit are measurable. Opsera's AI Coding Impact Benchmark Report, drawn from analysis of more than 250,000 developers across over 60 enterprise organisations, found that whilst AI-driven coding reduces time to pull request by up to 58%, AI-generated pull requests wait 4.6 times longer in review than human-written ones when governance frameworks are absent. The initial speed gains at the beginning of the development cycle are consumed during reviews, repairs, and security checks. Code duplication increased from 10.5% to 13.5% in AI-assisted codebases, and AI-generated code introduced 15 to 18% more security vulnerabilities per line of code compared to human-written code.
The Opsera data also revealed a widening skill gap. Senior engineers realised nearly five times the productivity gains of junior engineers when using AI tools. This finding upends the popular narrative that AI democratises software development. In practice, AI amplifies existing expertise: those who already understand architecture, security, and system design use AI effectively, whilst those who lack that foundation produce more code of lower quality, faster. The 21% of AI licences that remain underutilised across enterprises further suggests that organisations are paying for productivity gains they are not achieving.
Vibe Coding and the Erosion of Intentional Design
The term “vibe coding” was coined by Andrej Karpathy, co-founder of OpenAI and former AI leader at Tesla, in a post on X on 2 February 2025. “There's a new kind of coding I call 'vibe coding,' where you fully give in to the vibes, embrace exponentials, and forget that the code even exists,” Karpathy wrote. He described a workflow in which he spoke instructions to an AI via voice transcription, always hit “Accept All” on suggested changes, and never read the code diffs. It was intended as a playful observation about weekend projects. It became a cultural phenomenon, named Collins English Dictionary's Word of the Year for 2025.
The irony is instructive. Even Karpathy himself has retreated from his own creation. His Nanochat project, launched in October 2025, was entirely hand-coded in approximately 8,000 lines of PyTorch. When asked how much AI assistance he used, Karpathy responded: “It's basically entirely hand-written (with tab autocomplete). I tried to use Claude/Codex agents a few times but they just didn't work well enough at all.” The person who gave vibe coding its name does not trust the technique enough to use it on his own serious project.
The problem with vibe coding is not that it exists. For rapid prototyping, educational experiments, and disposable weekend projects, the approach has genuine utility. The problem is that enterprise software development has adopted the aesthetics of vibe coding without acknowledging its fundamental unsuitability for production systems. Developers describe requirements to AI assistants, accept generated code with minimal review, and push it to production at unprecedented speed. The result is codebases in which similar problems are solved in dissimilar ways, error handling varies wildly between components, and no single engineer possesses a coherent mental model of how the system actually works.
A study of 120 UK technology firms found that teams spent 41% more time debugging AI-generated code in systems exceeding 50,000 lines. Separately, 67% of developers surveyed reported increased debugging efforts as a direct consequence of speed-driven AI code generation. The Veracode 2025 GenAI Code Security Report, which analysed 80 coding tasks across more than 100 large language models, found that LLMs introduced security vulnerabilities in 45% of cases, with security performance showing no improvement over time despite advances in code generation capability. When given a choice between a secure and an insecure method, AI models chose the insecure option nearly half the time. For context-dependent vulnerabilities like cross-site scripting, only 12 to 13% of generated code was secure. Jens Wessling, CTO at Veracode, noted that with vibe coding, developers “do not need to specify security constraints to get the code they want, effectively leaving secure coding decisions to LLMs. Our research reveals GenAI models make the wrong choices nearly half the time, and it's not improving.”
These are not edge cases. They are systematic, predictable failures embedded in the fundamental architecture of how large language models generate code.
The Recursive Dependency Trap
The most dangerous aspect of current enterprise AI adoption is not any individual tool's limitations; it is the recursive structure of the system as a whole. Organisations are deploying AI to generate code, then deploying AI to review that code, then deploying AI to write the tests that validate both the generation and the review. At each layer, the same fundamental limitations propagate, and at each layer, the illusion of verification creates false confidence.
Consider the mechanics. An AI code generator produces a function that handles user authentication. It looks correct. It follows standard patterns. An AI code reviewer scans the function and finds no known vulnerability signatures. The function passes AI-generated unit tests. It is merged into the main branch. Three months later, a security researcher discovers that the authentication logic fails silently under a specific concurrency condition that none of the AI systems had the architectural awareness to anticipate.
This is not hypothetical speculation about some distant future risk. It is the documented reality of how AI-generated code behaves in production today. CodeRabbit's analysis of 470 pull requests found that AI-authored changes produced 10.83 issues per pull request compared to 6.45 for human-only pull requests. Critical issues were 1.4 times more common, and performance inefficiencies such as excessive input/output operations appeared nearly eight times more often in AI-generated code. AI-generated code was 1.88 times more likely to introduce improper password handling, 1.91 times more likely to create insecure object references, and 1.82 times more likely to implement insecure deserialisation. The AI systems reviewing these pull requests were effective at catching surface-level problems but consistently missed the deeper architectural and logic failures.
The recursive dependency model compounds this problem exponentially. When a human developer reviews AI-generated code, they bring contextual understanding, scepticism, and domain expertise that exists outside the statistical patterns the AI has learned. When an AI system reviews AI-generated code, it brings the same statistical pattern-matching approach that produced the code in the first place. The reviewer and the reviewed share a common epistemic foundation, which means they share common blind spots. It is the software engineering equivalent of asking a student to grade their own examination: technically possible, structurally unreliable.
Google's DORA (DevOps Research and Assessment) report, based on a survey of approximately 3,000 respondents, provides the most compelling evidence of this dynamic's real-world consequences. The 2024 report found that for every 25% increase in AI adoption, estimated delivery throughput decreased by 1.5% and delivery stability decreased by 7.2%. Crucially, 75% of respondents reported feeling more productive with AI tools, even as the objective metrics deteriorated. The 2025 follow-up report confirmed the trend: AI's correlation with increased instability persisted, even as the relationship with throughput reversed to become modestly positive. The conclusion from a decade of DORA research is unambiguous: improving the development process does not automatically improve software delivery, at least not without adherence to fundamentals like small batch sizes and robust testing mechanisms.
This perception gap, where developers believe they are working faster whilst objective measures show declining performance, is perhaps the most insidious feature of the recursive dependency model. It means organisations cannot rely on developer sentiment as an early warning system. The very people closest to the code are the least likely to recognise when AI augmentation has tipped into compounding technical debt.
The METR Paradox and the Illusion of Speed
METR's July 2025 randomised controlled trial provides the most rigorous evidence yet that AI-assisted coding's productivity benefits are, in certain critical contexts, illusory. The study recruited 16 experienced developers from large open-source repositories, averaging over 22,000 stars and one million lines of code, where participants had an average of five years and 1,500 commits of experience.
The results were striking. Developers using AI tools were 19% slower than those working without AI assistance. Before starting tasks, developers predicted that AI would reduce their completion time by 24%. After completing the study, they still believed AI had reduced their time by 20%. The perception of acceleration was completely divorced from objective reality.
Screen-recording data revealed one plausible mechanism: AI-assisted coding sessions showed more idle time, not merely “waiting for the model” time, but periods of complete inactivity. The researchers hypothesised that coding with AI requires less cognitive effort, making it easier to multitask or lose focus. In other words, the AI was not just failing to accelerate the work; it was actively degrading the concentration that experienced developers bring to complex problems.
The METR study carries important caveats. It focused on experienced developers working in repositories they knew intimately, a context where deep familiarity already provides substantial speed advantages. AI tools may offer greater benefit to less experienced developers or those working in unfamiliar codebases. Yet the finding remains profoundly important for enterprise settings, precisely because production-critical code is typically maintained by experienced developers with deep institutional knowledge. If AI tools slow down the very people most responsible for system reliability, the implications for production stability are severe.
Notably, 69% of study participants continued using AI tools after the experiment concluded, despite the measured slowdown. This suggests that the subjective experience of AI-assisted coding, the feeling of reduced cognitive load, the perception of progress, is compelling enough to override objective evidence of diminished performance. For organisations attempting to detect when they have crossed from beneficial augmentation into harmful dependency, this psychological dimension makes the threshold nearly invisible from the inside.
Detecting the Invisible Tipping Point
Organisations desperately need reliable indicators for when AI-assisted development has crossed from productivity enhancement into technical debt accumulation. The challenge is that the most obvious metrics, sprint velocity, lines of code shipped, feature delivery timelines, all move in the “right” direction even as underlying code quality deteriorates. AI makes it trivially easy to ship more code faster. The question is whether that code creates more problems than it solves.
Several empirical signals deserve close monitoring. The first is the ratio of debugging time to generation time. When teams begin spending more time understanding and fixing AI-generated code than they would have spent writing it themselves, the augmentation has become counterproductive. The UK study finding that teams spent 41% more time debugging AI-generated code in large systems suggests many organisations have already crossed this line without recognising it.
The second signal is the declining ability of team members to explain what the system does. If no individual developer can articulate, without consulting the AI, how a critical subsystem works, the organisation has lost genuine understanding of its own production infrastructure. This is not a theoretical risk; it is a measurable competency that can be assessed through architecture reviews and incident response exercises. Sonar's survey found that AI has shifted the centre of gravity in software engineering: the hard part is no longer writing code, but validating it. When 88% of developers report negative impacts from AI, specifically the generation of code that looks correct but is not reliable, the validation challenge becomes existential.
The third signal is rising incident severity alongside falling incident frequency. AI-generated code may produce fewer trivial bugs, the kind that AI review tools catch effectively, whilst simultaneously introducing fewer but more catastrophic failures, the kind that only human architectural understanding can prevent. If mean time to resolution is climbing even as raw defect counts decline, the system is accumulating the kind of deep technical debt that compounds silently until a major failure exposes it.
Gartner's predictions paint a grim picture of where this trajectory leads. The research firm warns that by 2028, prompt-to-app approaches adopted by citizen developers will increase software defects by 2,500%, triggering a software quality and reliability crisis. By 2027, 40% of enterprises using consumption-priced AI coding tools will face unplanned costs exceeding twice their expected budgets. Through 2026, atrophy of critical-thinking skills due to generative AI use is expected to push 50% of global organisations to require “AI-free” skills assessments. Gartner further predicts that 80% of the engineering workforce will need upskilling through 2027, specifically for AI collaboration skills.
The Supply Chain Dimension No One Anticipated
Beyond the direct quality and security risks of AI-generated code lies an entirely novel attack vector that did not exist before AI coding assistants: package hallucinations, or what security researchers have dubbed “slopsquatting.”
A major study presented at the USENIX Security Symposium in 2025 analysed 576,000 code samples from 16 large language models and found that 19.7% of package dependencies, totalling 440,445 instances, were hallucinated. These are references to software packages that simply do not exist. Open-source models hallucinated packages at nearly 22%, compared to 5% for commercial models. Alarmingly, 43% of these hallucinations repeated consistently across multiple queries, making them predictable targets for attackers. In total, the study identified 205,474 unique non-existent package names, each representing a potential vehicle for malicious code distribution.
The attack is elegant in its simplicity. An AI model consistently recommends a non-existent package. An attacker registers that name in the Python Package Index or npm registry, populates it with malicious code, and waits. The next time the AI recommends the package and a developer installs it without checking, the malicious code enters the production environment. Seth Michael Larson, security developer-in-residence at the Python Software Foundation, coined the term “slopsquatting” to describe this phenomenon. The package need not be malicious from the outset; it could initially appear legitimate but later beacon to a command-and-control server for a delayed payload, meaning that simply scanning the package at installation time reveals nothing.
The recursive dependency model makes this risk especially acute. If an AI code reviewer is scanning AI-generated code that references a hallucinated package, the reviewer has no mechanism for determining whether the package is legitimate. It will check for known vulnerability patterns in the dependency but cannot assess whether the dependency should exist in the first place. Only a human developer with domain knowledge, someone who understands what libraries the project actually needs, can make that judgement call.
What Must Remain Human
The evidence converges on a clear, if uncomfortable, conclusion: certain aspects of software development must remain under direct human control, not because humans are infallible, but because the types of errors humans make are different from, and complementary to, the types of errors AI systems make. A robust engineering organisation needs both perspectives, and current trends are systematically eliminating one of them.
Architectural governance is the first non-negotiable domain. AI systems can generate individual components, but the decisions about how those components relate to each other, how data flows between services, where trust boundaries exist, and how failure in one subsystem affects others, require the kind of holistic system understanding that no current AI possesses. Organisations must maintain human-led architecture review boards with genuine authority to reject AI-generated designs that compromise system integrity.
Security threat modelling is the second. Tenzai's research demonstrated conclusively that AI coding tools fail to implement proactive security controls. They avoid well-known vulnerability patterns but do not reason about the threat model specific to a given application. Human security architects who understand the business context, the regulatory environment, and the adversarial landscape must remain directly involved in security design decisions. Delegating this to AI is not efficiency; it is negligence.
Incident response and system comprehension represent the third critical domain. When production systems fail, the speed and effectiveness of response depends entirely on whether the responding engineers genuinely understand the system they are fixing. If the codebase was generated by AI, reviewed by AI, and tested by AI, and if no human maintains a coherent mental model of how the pieces fit together, incident response degrades from engineering into guesswork. Organisations should conduct regular “comprehension audits” in which engineers are asked to trace the execution path of critical operations without AI assistance.
Finally, the definition of “done” must remain a human judgement. AI systems optimise for the metrics they are given: test pass rates, static analysis scores, code coverage percentages. These are useful signals, but they are not sufficient conditions for production readiness. Whether a system is actually ready to serve real users, with all the nuance that entails regarding regulatory compliance, user experience, operational readiness, and risk tolerance, is a judgement call that requires the kind of contextual reasoning that remains firmly beyond current AI capabilities.
Governance Frameworks for the Recursive Age
Preventing the worst outcomes of recursive AI dependency requires more than good intentions. It requires structural safeguards embedded in organisational processes.
The first safeguard is mandatory human review gates at architecturally significant boundaries. Not every pull request requires deep human scrutiny, but changes to authentication systems, data access layers, service boundaries, and deployment configurations must have human reviewers who understand the system-level implications. These gates should be enforced programmatically, not left to team discretion.
The second is AI transparency requirements. Every piece of AI-generated code should be tagged as such, with metadata indicating which model generated it, what prompt was used, and what review (human or AI) it received. This creates an audit trail that enables targeted review of AI-generated code when new vulnerability classes are discovered, rather than requiring a full codebase audit. Sonar's 2026 AI Code Assurance feature, which labels and monitors projects containing AI-generated code and requires it to pass stricter quality gates, represents an early industry attempt at this kind of structural transparency.
The third is regular “AI-free” development exercises. Just as military organisations conduct exercises without electronic communications to ensure they can operate when systems fail, engineering teams should periodically develop and review code without AI assistance. This serves the dual purpose of maintaining human skills and benchmarking the actual (rather than perceived) productivity impact of AI tools.
The fourth safeguard is independent security testing that assumes AI-generated code is present. Traditional penetration testing focuses on known vulnerability classes. Organisations deploying AI-generated code need testing methodologies specifically designed to find the kinds of failures that AI introduces: missing authorisation controls, business logic errors, hallucinated dependencies, and architectural inconsistencies.
The fifth, and perhaps most important, is cultural. Organisations must resist the narrative that human code review is a bottleneck to be automated away. The DORA data shows that faster code generation without corresponding improvements in review and validation leads to declining system stability. Human review is not the bottleneck; it is the safety mechanism. Treating it as overhead to be optimised creates precisely the conditions under which catastrophic failures become inevitable.
An Industry at an Inflection Point
The software industry is conducting an unprecedented experiment. It is simultaneously increasing the volume of code that no individual human fully understands, reducing the human capacity to review that code, and deploying AI systems to fill the resulting verification gap: AI systems that share the fundamental limitations of the code generators they are meant to police.
The METR paradox ensures that the engineers closest to this process believe it is working better than it actually is. The DORA data confirms that system-level performance degrades even as individual productivity metrics improve. Gartner's projections suggest the accumulated technical debt will reach crisis proportions within years, not decades. The AI coding assistant market, which reached $7.37 billion in 2025 and is projected to hit $30.1 billion by 2032, represents enormous commercial momentum pushing in the direction of ever greater AI dependency. The economic incentives to automate code review, reduce headcount, and accelerate release cycles are powerful. The countervailing incentives to maintain human expertise, invest in architectural governance, and slow down enough to understand what is being shipped are, at present, far weaker.
None of this means AI coding tools should be abandoned. The productivity gains for appropriate use cases are real and substantial. What it means is that the current trajectory, in which AI generates ever more code, AI reviews ever more code, and humans understand ever less of what is running in production, leads somewhere profoundly dangerous. Not to a dramatic system collapse, but to a gradual, invisible degradation of software quality and reliability across the entire enterprise technology landscape.
The organisations that will thrive in this environment are not those that adopt AI most aggressively or most cautiously. They are those that maintain genuine human understanding of their critical systems whilst using AI to accelerate the work that humans still direct, review, and comprehend. The recursive dependency loop can be broken, but only by organisations willing to insist that some aspects of software engineering remain irreducibly human, not as a concession to nostalgia, but as a structural requirement for systems that actually work.
The ouroboros, the serpent eating its own tail, is an ancient symbol of self-consuming cycles. The enterprise software industry would do well to recognise the shape of the loop it is currently building, before the tail disappears entirely.
References and Sources
- SonarSource, “State of Code Developer Survey 2025,” SonarSource, 2025. Available: https://www.sonarsource.com/state-of-code-developer-survey-report.pdf
- GitHub Copilot statistics and AI coding assistant market data, as reported by multiple industry sources including Quantumrun Foresight and GetPanto. Available: https://www.quantumrun.com/consulting/github-copilot-statistics/ and https://www.getpanto.ai/blog/ai-coding-assistant-statistics
- Quartz, “AI vibe coding has gone wrong. Time for a vibe check,” qz.com, 2025. Available: https://qz.com/ai-vibe-coding-software-development
- Gartner, “Predicts 2026: AI Potential and Risks Emerge in Software Engineering Technologies,” Gartner Research, 2025. Available: https://www.armorcode.com/report/gartner-predicts-2026-ai-potential-and-risks-emerge-in-software-engineering-technologies
- CodeRabbit, “State of AI vs Human Code Generation Report,” CodeRabbit, December 2025. Available: https://www.coderabbit.ai/blog/state-of-ai-vs-human-code-generation-report
- Qodo, “8 Best AI Code Review Tools That Catch Real Bugs in 2026,” Qodo Blog, 2026. Available: https://www.qodo.ai/blog/best-ai-code-review-tools-2026/
- Tenzai, “Bad Vibes: Comparing the Secure Coding Capabilities of Popular Coding Agents,” Tenzai Blog, December 2025. Available: https://blog.tenzai.com/bad-vibes-comparing-the-secure-coding-capabilities-of-popular-coding-agents/
- SonarSource, “Sonar Data Reveals Critical Verification Gap in AI Coding: 96% Don't Fully Trust Output, Yet Only 48% Verify It,” SonarSource Press Release, January 2026. Available: https://www.sonarsource.com/company/press-releases/sonar-data-reveals-critical-verification-gap-in-ai-coding/
- Opsera, “AI Coding Impact 2026 Benchmark Report,” Opsera, 2025. Available: https://opsera.ai/resources/report/ai-coding-impact-2025-benchmark-report/
- A. Karpathy, post on X (formerly Twitter), 2 February 2025. Available: https://x.com/karpathy/status/1886192184808149383
- Collins English Dictionary, “Word of the Year 2025: Vibe Coding,” as reported by Wikipedia. Available: https://en.wikipedia.org/wiki/Vibe_coding
- A. Karpathy, Nanochat repository discussion, GitHub, October 2025. Available: https://github.com/karpathy/nanochat/discussions/1
- Second Talent, “AI-Generated Code Quality Metrics and Statistics for 2026,” SecondTalent.com, 2026. Available: https://www.secondtalent.com/resources/ai-generated-code-quality-metrics-and-statistics-for-2026/
- Veracode, “2025 GenAI Code Security Report,” Veracode, July 2025. Available: https://www.veracode.com/resources/analyst-reports/2025-genai-code-security-report/
- Google Cloud, “Announcing the 2024 DORA Report,” Google Cloud Blog, 2024. Available: https://cloud.google.com/blog/products/devops-sre/announcing-the-2024-dora-report
- Google Cloud, “Announcing the 2025 DORA Report,” Google Cloud Blog, 2025. Available: https://cloud.google.com/blog/products/ai-machine-learning/announcing-the-2025-dora-report
- METR, “Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity,” METR, July 2025. Available: https://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/
- Gartner, “Predicts 2026: AI Potential and Risks Emerge in Software Engineering Technologies,” as reported by ArmorCode. Available: https://www.armorcode.com/blog/your-genai-code-debt-is-coming-due-heres-what-gartner-predicts
- Gartner, “Strategic Predictions for 2026: How AI's Underestimated Influence Is Reshaping Business,” Gartner, 2025. Available: https://www.gartner.com/en/articles/strategic-predictions-for-2026
- USENIX, “We Have a Package for You: A Comprehensive Analysis of Package Hallucinations by Code Generating LLMs,” USENIX Security Symposium, 2025. Available: https://www.usenix.org/publications/loginonline/we-have-package-you-comprehensive-analysis-package-hallucinations-code
- BleepingComputer, “AI-hallucinated code dependencies become new supply chain risk,” BleepingComputer, 2025. Available: https://www.bleepingcomputer.com/news/security/ai-hallucinated-code-dependencies-become-new-supply-chain-risk/

Tim Green UK-based Systems Theorist & Independent Technology Writer
Tim explores the intersections of artificial intelligence, decentralised cognition, and posthuman ethics. His work, published at smarterarticles.co.uk, challenges dominant narratives of technological progress while proposing interdisciplinary frameworks for collective intelligence and digital stewardship.
His writing has been featured on Ground News and shared by independent researchers across both academic and technological communities.
ORCID: 0009-0002-0156-9795 Email: tim@smarterarticles.co.uk