The AI Coding Productivity Illusion: Why Developers Feel Faster But Deliver Slower

Developers are convinced that AI coding assistants make them faster. The data tells a different story entirely. In one of the most striking findings to emerge from software engineering research in 2025, experienced programmers using frontier AI tools actually took 19 per cent longer to complete tasks than those working without assistance. Yet those same developers believed the AI had accelerated their work by 20 per cent.
This perception gap represents more than a curious psychological phenomenon. It reveals a fundamental disconnect between how developers experience AI-assisted coding and what actually happens to productivity, code quality, and long-term maintenance costs. The implications extend far beyond individual programmers to reshape how organisations measure software development performance and how teams should structure their workflows.
The Landmark Study That Challenged Everything
The research that exposed this discrepancy came from METR, an AI safety organisation that conducted a randomised controlled trial with 16 experienced open-source developers. Each participant had an average of five years of prior experience with the mature projects they worked on. The study assigned 246 tasks randomly to either allow or disallow AI tool usage, with developers primarily using Cursor Pro and Claude 3.5/3.7 Sonnet when permitted.
Before completing their assigned issues, developers predicted AI would speed them up by 24 per cent. After experiencing the slowdown firsthand, they still reported believing AI had improved their performance by 20 per cent. The objective measurement showed the opposite: tasks took 19 per cent longer when AI tools were available.
This finding stands in stark contrast to vendor-sponsored research. GitHub, a subsidiary of Microsoft, published studies claiming developers completed tasks 55.8 per cent faster with Copilot. A multi-company study spanning Microsoft, Accenture, and a Fortune 100 enterprise reported a 26 per cent productivity increase. Google's internal randomised controlled trial found developers using AI finished assignments 21 per cent faster.
The contradiction isn't necessarily that some studies are wrong and others correct. Rather, it reflects different contexts, measurement approaches, and crucially, different relationships between researchers and AI tool vendors. The studies showing productivity gains have authors affiliated with companies that produce or invest in AI coding tools. Whilst this doesn't invalidate their findings, it warrants careful consideration when evaluating claims.
Why Developers Feel Faster Whilst Moving Slower
Several cognitive biases compound to create the perception gap. Visible activity bias makes watching code generate feel productive, even when substantial time disappears into reviewing, debugging, and correcting that output. Cognitive load reduction from less typing creates an illusion of less work, despite the mental effort required to validate AI suggestions.
The novelty effect means new tools feel exciting and effective initially, regardless of objective outcomes. Attribution bias leads developers to credit AI for successes whilst blaming other factors for failures. And sunk cost rationalisation kicks in after organisations invest in AI tools and training, making participants reluctant to admit the investment hasn't paid off.
Stack Overflow's 2025 Developer Survey captures this sentiment shift quantitatively. Whilst 84 per cent of respondents reported using or planning to use AI tools in their development process, positive sentiment dropped to 60 per cent from 70 per cent the previous year. More tellingly, 46 per cent of developers actively distrust AI tool accuracy, compared to only 33 per cent who trust them. When asked directly about productivity impact, just 16.3 per cent said AI made them more productive to a great extent. The largest group, 41.4 per cent, reported little or no effect.
Hidden Quality Costs That Accumulate Over Time
The productivity perception gap becomes more concerning when examining code quality metrics. CodeRabbit's December 2025 “State of AI vs Human Code Generation” report analysed 470 open-source GitHub pull requests and found AI-generated code produced approximately 1.7 times more issues than human-written code.
The severity of defects matters as much as their quantity. AI-authored pull requests contained 1.4 times more critical issues and 1.7 times more major issues on average. Algorithmic errors appeared 2.25 times more frequently in AI-generated changes. Exception-handling gaps doubled. Issues related to incorrect sequencing, missing dependencies, and concurrency misuse showed close to twofold increases across the board.
These aren't merely cosmetic problems. Logic and correctness errors occurred 1.75 times more often. Security findings appeared 1.57 times more frequently. Performance issues showed up 1.42 times as often. Readability problems surfaced more than three times as often in AI-coauthored pull requests.
GitClear's analysis of 211 million changed lines of code between 2020 and 2024 revealed structural shifts in how developers work that presage long-term maintenance challenges. The proportion of new code revised within two weeks of its initial commit nearly doubled from 3.1 per cent in 2020 to 5.7 per cent in 2024. This code churn metric indicates premature or low-quality commits requiring immediate correction.
Perhaps most concerning for long-term codebase health: refactoring declined dramatically. The percentage of changed code lines associated with refactoring dropped from 25 per cent in 2021 to less than 10 per cent in 2024. Duplicate code blocks increased eightfold. For the first time, copy-pasted code exceeded refactored lines, suggesting developers spend more time adding AI-generated snippets than improving existing architecture.
The Hallucination Problem Compounds Maintenance Burdens
Beyond quality metrics, AI coding assistants introduce entirely novel security vulnerabilities through hallucinated dependencies. Research analysing 576,000 code samples from 16 popular large language models found 19.7 per cent of package dependencies were hallucinated, meaning the AI suggested importing libraries that don't actually exist.
Open-source models performed worse, hallucinating nearly 22 per cent of dependencies compared to 5 per cent for commercial models. Alarmingly, 43 per cent of these hallucinations repeated across multiple queries, making them predictable targets for attackers.
This predictability enabled a new attack vector security researchers have termed “slopsquatting.” Attackers monitor commonly hallucinated package names and register them on public repositories like PyPI and npm. When developers copy AI-generated code without verifying dependencies, they inadvertently install malicious packages. Between late 2023 and early 2025, this attack method moved from theoretical concern to active exploitation.
The maintenance costs of hallucinations extend beyond security incidents. Teams must allocate time to verify every dependency AI suggests, check whether suggested APIs actually exist in the versions specified, and validate that code examples reflect current library interfaces rather than outdated or imagined ones. A quarter of developers estimate that one in five AI-generated suggestions contain factual errors or misleading code. More than three-quarters encounter frequent hallucinations and avoid shipping AI-generated code without human verification. This verification overhead represents a hidden productivity cost that perception metrics rarely capture.
Companies implementing comprehensive AI governance frameworks report 60 per cent fewer hallucination-related incidents compared to those using AI tools without oversight controls. The investment in governance processes, however, further erodes the time savings AI supposedly provides.
How Speed Without Stability Creates Accelerated Chaos
The 2025 DORA Report from Google provides perhaps the clearest articulation of how AI acceleration affects software delivery at scale. AI adoption among software development professionals reached 90 per cent, with practitioners typically dedicating two hours daily to AI tools. Over 80 per cent reported AI enhanced their productivity, and 59 per cent perceived positive influence on code quality.
Yet the report's analysis of delivery metrics tells a more nuanced story. AI adoption continues to have a negative relationship with software delivery stability. Developers using AI completed 21 per cent more tasks and merged 98 per cent more pull requests, but organisational delivery metrics remained flat. The report concludes that AI acts as an amplifier, strengthening high-performing organisations whilst worsening dysfunction in those that struggle.
The key insight: speed without stability is accelerated chaos. Without robust automated testing, mature version control practices, and fast feedback loops, increased change volume leads directly to instability. Teams treating AI as a shortcut create faster bugs and deeper technical debt.
Sonar's research quantifies what this instability costs. On average, organisations encounter approximately 53,000 maintainability issues per million lines of code. That translates to roughly 72 code smells caught per developer per month, representing a significant but often invisible drain on team efficiency. Up to 40 per cent of a business's entire IT budget goes toward dealing with technical debt fallout, from fixing bugs in poorly written code to maintaining overly complex legacy systems.
The Uplevel Data Labs study of 800 developers reinforced these findings. Their research found no significant productivity gains in objective measurements such as cycle time or pull request throughput. Developers with Copilot access introduced a 41 per cent increase in bugs, suggesting a measurable negative impact on code quality. Those same developers saw no reduction in burnout risk compared to those working without AI assistance.
Redesigning Workflows for Downstream Reality
Recognising the perception-reality gap doesn't mean abandoning AI coding tools. It means restructuring workflows to account for their actual strengths and weaknesses rather than optimising solely for initial generation speed.
Microsoft's internal approach offers one model. Their AI-powered code review assistant scaled to support over 90 per cent of pull requests, impacting more than 600,000 monthly. The system helps engineers catch issues faster, complete reviews sooner, and enforce consistent best practices. Crucially, it augments human review rather than replacing it, with AI handling routine pattern detection whilst developers focus on logic, architecture, and context-dependent decisions.
Research shows teams using AI-powered code review reported 81 per cent improvement in code quality, significantly higher than 55 per cent for fast teams without AI. The difference lies in where AI effort concentrates. Automated review can eliminate 80 per cent of trivial issues before reaching human reviewers, allowing senior developers to invest attention in architectural decisions rather than formatting corrections.
Effective workflow redesign incorporates several principles that research supports. First, validation must scale with generation speed. When AI accelerates code production, review and testing capacity must expand proportionally. Otherwise, the security debt compounds as nearly half of AI-generated code fails security tests. Second, context matters enormously. According to Qodo research, missing context represents the top issue developers face, reported by 65 per cent during refactoring and approximately 60 per cent during test generation and code review. AI performs poorly without sufficient project-specific information, yet developers often accept suggestions without providing adequate context.
Third, rework tracking becomes essential. The 2025 DORA Report introduced rework rate as a fifth core metric precisely because AI shifts where development time gets spent. Teams produce initial code faster but spend more time reviewing, validating, and correcting it. Monitoring cycle time, code review patterns, and rework rates reveals the true productivity picture that perception surveys miss.
Finally, trust calibration requires ongoing attention. Around 30 per cent of developers still don't trust AI-generated output, according to DORA. This scepticism, rather than indicating resistance to change, may reflect appropriate calibration to actual AI reliability. Organisations benefit from cultivating healthy scepticism rather than promoting uncritical acceptance of AI suggestions.
From Accelerated Output to Sustainable Delivery
The AI coding productivity illusion persists because subjective experience diverges so dramatically from objective measurement. Developers genuinely feel more productive when AI generates code quickly, even as downstream costs accumulate invisibly.
Breaking this illusion requires shifting measurement from initial generation speed toward total lifecycle cost. An AI-assisted feature that takes four hours to generate but requires six hours of debugging, security remediation, and maintenance work represents a net productivity loss, regardless of how fast the first commit appeared.
Organisations succeeding with AI coding tools share common characteristics. They maintain rigorous code review regardless of code origin. They invest in automated testing proportional to development velocity. They track quality metrics alongside throughput metrics. They train developers to evaluate AI suggestions critically rather than accepting them uncritically.
The research increasingly converges on a central insight: AI coding assistants are powerful tools that require skilled operators. In the hands of experienced developers who understand both their capabilities and limitations, they can genuinely accelerate delivery. Applied without appropriate scaffolding, they create technical debt faster than any previous development approach.
The 19 per cent slowdown documented by METR represents one possible outcome, not an inevitable one. But achieving better outcomes requires abandoning the comfortable perception that AI automatically makes development faster and embracing the more complex reality that speed and quality require continuous, deliberate balancing.
References and Sources
- METR (2025). “Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity.” https://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/
- Stack Overflow (2025). “2025 Stack Overflow Developer Survey: AI Section.” https://survey.stackoverflow.co/2025/ai
- CodeRabbit (2025). “State of AI vs Human Code Generation Report.” https://www.coderabbit.ai/blog/state-of-ai-vs-human-code-generation-report
- GitClear (2025). “AI Copilot Code Quality: 2025 Data Suggests 4x Growth in Code Clones.” https://www.gitclear.com/ai_assistant_code_quality_2025_research
- Google DORA (2025). “State of AI-assisted Software Development 2025.” https://dora.dev/research/2025/dora-report/
- Uplevel Data Labs (2024). “Gen AI for Coding Research Report.” https://resources.uplevelteam.com/gen-ai-for-coding
- Qodo (2025). “State of AI Code Quality Report.” https://www.qodo.ai/reports/state-of-ai-code-quality/
- Sonar (2025). “The State of Code for Developers Report 2025.” https://www.sonarsource.com/the-state-of-code/
- Socket Research (2025). “AI-hallucinated code dependencies become new supply chain risk.” https://www.bleepingcomputer.com/news/security/ai-hallucinated-code-dependencies-become-new-supply-chain-risk/
- Microsoft Engineering (2025). “Enhancing Code Quality at Scale with AI-Powered Code Reviews.” https://devblogs.microsoft.com/engineering-at-microsoft/enhancing-code-quality-at-scale-with-ai-powered-code-reviews/

Tim Green UK-based Systems Theorist & Independent Technology Writer
Tim explores the intersections of artificial intelligence, decentralised cognition, and posthuman ethics. His work, published at smarterarticles.co.uk, challenges dominant narratives of technological progress while proposing interdisciplinary frameworks for collective intelligence and digital stewardship.
His writing has been featured on Ground News and shared by independent researchers across both academic and technological communities.
ORCID: 0009-0002-0156-9795 Email: tim@smarterarticles.co.uk