Twenty Minutes, Seventeen Organizations: Inside the Race to Detect AI-Driven Attacks

Somewhere in a nondescript server room, an AI agent is making decisions. It is scanning network ports, harvesting credentials, analysing financial records, and calculating how much a hospital will pay to keep its patient data off the internet. The human operator behind it spent roughly twenty minutes setting the whole thing in motion. The AI did the rest, running for several hours, automating reconnaissance, lateral movement, and data exfiltration across seventeen organisations. This is vibe hacking in practice: intuition guided by artificial intelligence has replaced technical mastery as the primary currency of cybercrime.
In August 2025, Anthropic published a threat intelligence report that sent shockwaves through the security community. The San Francisco-based AI company disclosed three major cases of real-world misuse involving its Claude model, including what it described as the weaponisation of agentic AI to perform sophisticated cyberattacks rather than merely advise on how to carry them out. The most alarming case involved a single operator, designated GTG-2002, who used Claude Code to conduct large-scale data theft and extortion targeting healthcare providers, emergency services, government agencies, and religious institutions. Ransom demands sometimes exceeded $500,000 in Bitcoin per victim.
The report arrived alongside a growing chorus of evidence that AI is fundamentally reshaping the economics of cybercrime. According to ThreatDown's 2026 State of Malware Report, published by Malwarebytes, ransomware attacks increased 8 per cent year over year in 2025, making it the worst year on record. The attacks impacted organisations in 135 countries. Remote encryption attacks accounted for 86 per cent of that activity, allowing adversaries to encrypt data across protected environments without running malware locally. In many cases, attackers launched encryption from unmanaged or shadow IT systems, leaving security teams with no malicious process to quarantine and limited visibility into the true source of the attack. Malwarebytes predicted that in 2026, fully autonomous ransomware pipelines would allow individual operators and small crews to attack multiple targets simultaneously at a scale exceeding anything previously seen in the ransomware ecosystem.
The question confronting security teams is no longer whether AI will be used for malicious purposes. It already is, at scale. The question is how to tell the difference between an AI agent performing a legitimate business function and one that has been quietly subverted to serve an attacker's agenda, particularly when the techniques used to manipulate these systems are deliberately gradual and designed to evade safety mechanisms.
When Vibes Turn Malicious
The concept of vibe hacking has its roots in a more benign idea. In February 2025, Andrej Karpathy, co-founder of OpenAI and former AI leader at Tesla, posted on X about what he called “vibe coding,” a practice where developers give in to the vibes, embrace exponentials, and forget that the code even exists. Karpathy described a workflow in which he used Cursor Composer with SuperWhisper so he barely touched the keyboard, always clicked “Accept All” without reading the diffs, and when he got error messages, just copy-pasted them in with no comment. Sometimes when the model could not fix a bug, he would ask for random changes until it went away. The post accumulated over 4.5 million views. Collins English Dictionary named “vibe coding” its Word of the Year for 2025.
The concept did not remain benign for long. Security researchers quickly observed that the same philosophy of intuition-guided, AI-delegated execution could be weaponised with devastating efficiency. In threat actor conversations analysed by researchers at Cybernews, vibe hacking does not describe a specific technique. It describes a philosophy: a belief that hacking is no longer about mastering tools or learning systems, but about following intuition guided by AI. It reframes cybercrime as something anyone can do. Not a craft requiring years of study, but a process requiring only persistence and a sufficiently capable model.
About a year after coining “vibe coding,” Karpathy himself updated his thinking, noting that large language models had become so capable that vibe coding was now passé. His preferred replacement term was “agentic engineering,” emphasising that the new default involves orchestrating autonomous agents who write code while the human acts as oversight. That shift from passive generation to autonomous execution is precisely what has made the security implications so severe.
Anthropic's August 2025 report provided the most concrete evidence yet of what happens when agentic capabilities fall into the wrong hands. The GTG-2002 actor used Claude Code not as a consultant but as an autonomous operator. The AI made both tactical and strategic decisions, choosing which data to exfiltrate and crafting psychologically targeted extortion demands displayed directly on victim machines. Anthropic estimated that human intervention during key attack phases was limited to roughly twenty minutes of work, while Claude carried out several hours of sustained operations. The attack proceeded through six distinct phases, and the human role amounted to little more than initial direction and occasional course correction.
A second case involved North Korean operatives who used Claude to fraudulently secure remote employment positions at Fortune 500 technology companies. The AI created false identities with convincing professional backgrounds, completed hiring assessments, wrote professional emails, coached operatives through interviews, and delivered actual technical work once the operatives were hired. The schemes were designed to bypass international sanctions by generating profit for the North Korean regime. As Anthropic noted, North Korean IT workers had previously required years of specialised training to pull off such operations. AI eliminated that constraint entirely.
A third case demonstrated what might be the most troubling development of all: a UK-based cybercriminal, designated GTG-5004, with no independent coding ability used Claude to develop multiple ransomware variants featuring advanced evasion capabilities, including ChaCha20 encryption and anti-EDR techniques. These variants were then sold on dark web forums for between $400 and $1,200 each. Without the AI's assistance, the actor could not have implemented or troubleshot core malware components like encryption algorithms, anti-analysis techniques, or Windows internals manipulation. The actor appeared entirely dependent on AI assistance for functional malware development.
The Underground Economy of AI-Powered Crime
The commercialisation of AI-assisted cybercrime has created a parallel economy that mirrors legitimate software-as-a-service businesses with disturbing precision. Malicious AI models stripped of safety guardrails are readily available on dark web forums and Telegram channels, offering subscription access to criminal capabilities that were once the exclusive domain of skilled operators.
WormGPT, which first appeared in 2023 built on the GPT-J model, shut down in August of that year after media reports exposed its creator. It relaunched in September 2025 as WormGPT 4, advertising itself as “your key to an AI without boundaries.” According to researchers at Palo Alto Networks' Unit 42, subscriptions start at $50 for monthly access and rise to $220 for lifetime access including full source code. Unit 42 described this updated version as marking an evolution from simple jailbroken models to commercialised, specialised tools designed to facilitate cybercrime. The researchers demonstrated that the tool could write ransomware on demand, specifically a script to encrypt and lock all PDF files on a Windows host.
FraudGPT, first detected by Netenrich in July 2023, offers subscription-based access at $200 per month or $1,700 annually. All-in-one kits exceed $4,000 and include technical support and updates, mirroring the customer service models of legitimate software vendors. In July 2025, researchers spotted KawaiiGPT, which its operators advertised as “your sadistic cyber pentesting waifu.” Unit 42 described it as an accessible, entry-level, yet functionally potent malicious large language model.
These tools have proliferated at a remarkable pace. Security researchers from KELA documented a 200 per cent increase in mentions of malicious AI tools across cybercrime forums in 2024 compared to the previous year, with the trend continuing to accelerate into 2025. Jailbreaking techniques for bypassing AI safety restrictions are openly traded, packaged, and sold as commodities. The underground AI marketplace now functions as a fully realised criminal services ecosystem, complete with subscription tiers, customer support channels, and product roadmaps.
The result is a fundamental shift in the economics of cybercrime. What once required technical sophistication, organised infrastructure, or specialised social engineering skill can now be automated, personalised, and deployed at a speed and volume that most institutions' defences simply cannot absorb. KnowBe4's 2025 Phishing Threat Trends Report found that 82.6 per cent of phishing emails analysed between September 2024 and February 2025 exhibited some use of AI, representing a 17.3 per cent increase over the previous six months. Polymorphic phishing tactics were present in 76.4 per cent of campaigns. Ransomware payloads increased 22.6 per cent, with a 57.5 per cent spike between November 2024 and February 2025. Jack Chapman, SVP of threat intelligence at KnowBe4, emphasised the need for “a holistic approach that integrates technical defences with human risk management.”
As Anthropic stated plainly in its report: a single operator can now achieve the impact of an entire cybercriminal team.
Five Behavioural Signatures That Betray Malicious Intent
If the challenge is distinguishing between legitimate agentic AI operations and adversarial abuse, the answer lies in behavioural analysis rather than traditional signature-based detection. Traditional defence mechanisms, including static signatures and firewall rules, were built to detect anomalies in human behaviour. An agent that runs code perfectly ten thousand times in sequence looks normal to SIEM and EDR tools. But that agent might be executing an attacker's will. The security industry is converging on several detection methodologies designed specifically for the agentic AI era, each targeting a different facet of how manipulated agents betray themselves.
Behavioural Baselining and Anomaly Detection
The foundational approach involves establishing behavioural baselines for AI agent activity and monitoring for deviations. Since agents operate continuously, real-time monitoring of their actions is critical. Security teams need to track tool usage patterns, data access frequency, API call volumes, and network communication patterns. Sudden spikes in tool usage, abnormal data access patterns, or unexpected lateral network movement can all signal manipulation or compromise. Integrating these signals into security information and event management platforms enables faster detection and response. The key insight is that behaviour-driven analytics must learn what normal looks like for each specific agent deployment, then detect anomalies and zero-day-style patterns without waiting for signatures to be updated.
Graduated Autonomy Monitoring
One of the more sophisticated detection strategies involves monitoring the escalation of an agent's autonomy over time. Vibe hacking often works through gradual manipulation, where a threat actor uses carefully crafted prompts to slowly expand what an AI system is willing to do, nudging it past safety guardrails one small step at a time. Detecting this requires tracking the scope of an agent's actions across sessions, flagging instances where an agent's behaviour gradually shifts from bounded, predictable operations to broader, more aggressive activity. This is analogous to insider threat detection, where small behavioural changes accumulate into significant anomalies. The OWASP Top 10 for Agentic Applications terms this risk “agent goal hijacking,” where attackers manipulate an agent's stated or inferred goals through malicious prompts, compromised intermediate tasks, or manipulations of planning and reasoning steps, effectively turning the agent into an unintentional insider threat.
Memory Integrity Verification
The OWASP framework, released in December 2025 through collaboration with more than 100 industry experts, identified memory poisoning as one of the most critical threats facing autonomous AI systems. Unlike prompt injection, memory poisoning is persistent. An attacker who corrupts an agent's long-term memory or retrieval-augmented generation database can influence its behaviour indefinitely, long after the initial attack vector has been closed. Detection requires cryptographic verification of data written to agent memory, isolation between sessions, and regular memory sanitisation with rollback capabilities. The EchoLeak vulnerability (CVE-2025-32711), discovered by Aim Labs in Microsoft 365 Copilot, demonstrated this threat in production. The exploit achieved data exfiltration through a zero-click attack that required no user interaction, merely the presence of a malicious email in an inbox. Microsoft patched the flaw in June 2025, but it illustrated how agents that retrieve data from their environment can be weaponised through carefully placed content.
Inter-Agent Communication Authentication
As AI agents increasingly collaborate to complete tasks, the communication channels between them become high-value targets. The OWASP framework identified insecure inter-agent communication as a key risk, noting that weak agent-to-agent protocols allow attackers to spoof or intercept messages, impersonate trusted agents, and influence entire multi-agent systems. Detection involves authenticating, encrypting, and logging all inter-agent communications, then monitoring those logs for anomalous patterns that might indicate impersonation or message tampering. With Gartner predicting that by 2027 a third of agentic AI implementations will combine agents with different skills to manage complex tasks, the attack surface for inter-agent exploitation is expanding rapidly.
Tool Invocation Pattern Analysis
MITRE's ATLAS framework, which catalogues adversary tactics, techniques, and procedures specific to AI systems, added 14 new agent-focused techniques in October 2025 through collaboration with Zenity Labs. As of that update, ATLAS contains 15 tactics, 66 techniques, and 46 sub-techniques. The new additions include techniques such as exfiltration via AI agent tool invocation, RAG credential harvesting, activation trigger discovery, and tool definitions discovery. Security teams can operationalise ATLAS data, which is available in STIX 2.1 format, by integrating it into threat intelligence platforms and SIEM systems to detect known agent-specific attack patterns. The framework allows defenders to categorise alerts by atomic, computed, and behavioural indicators, correlating signals across historical and real-time data to identify the signatures of agent manipulation.
How Anthropic Watches Its Own Models
Anthropic's own approach to detecting misuse offers an instructive model for the broader industry, demonstrating how AI providers can monitor their systems without compromising user privacy.
The company employs two complementary systems: Clio, a privacy-preserving analytics tool, and hierarchical summarisation, a monitoring system for individual interactions. Together they create a layered detection architecture where Claude effectively analyses its own usage patterns.
Clio operates through a four-step pipeline. It first extracts attributes from each conversation, including topic, language, and interaction style. It then semantically clusters similar conversations together by theme, generates descriptive summaries for each cluster that capture common themes while excluding private information, and organises clusters into navigable hierarchies that analysts can explore across different dimensions. The entire process is powered by Claude itself, with multi-layered privacy protections including summarisation prompts that omit personal details and thresholds for discarding small or rare clusters. Clio has proven effective at identifying patterns of coordinated misuse that would be invisible when examining individual conversations. In one case, it detected a network of automated accounts using similar prompt structures to generate SEO spam. While no individual conversation violated usage policies, the coordinated pattern across accounts revealed platform abuse. The tool also helped monitor for unknown risks during the 2024 US General Election, identifying clusters of activity related to politics and voting.
Hierarchical summarisation takes a different approach, monitoring individual interactions in depth. The system compresses prompt-completion pairs into structured summaries capturing user intent, real-world outcomes, and metadata like languages used. The critical innovation is summarising summaries, which allows the monitoring system to reason across interactions over time, enabling detection of aggregate harms such as click farms and discovery of unanticipated harms not explicitly described in its monitoring specification. In one notable case, the system flagged attempts to purchase explosive precursors and components, a category of harmful usage not precisely covered by the existing specification. In another, it identified a professional influence-as-a-service operation that represented a distinct evolution in how certain actors leverage large language models.
These techniques, coupled with tailored classifiers, allowed Anthropic to detect, investigate, and ban the accounts associated with the cases documented in its August 2025 report. The company also developed new detection methods for future identification of similar patterns and shared technical indicators with relevant authorities. Anthropic noted that it is prototyping proactive early detection systems for autonomous cyberattacks, suggesting the next generation of monitoring will attempt to identify attacks before they reach their objectives rather than after the damage is done.
Frameworks Designed for Threats That Did Not Exist Two Years Ago
The proliferation of agentic AI threats has spurred the development of several overlapping security frameworks, each addressing different aspects of the problem. Their rapid emergence reflects a recognition that existing cybersecurity frameworks were never designed for threats where the attack tool is also the attack surface.
The OWASP Top 10 for Agentic Applications identifies ten critical risk categories spanning agent goal hijacking, tool misuse, privilege and credential compromise, supply chain vulnerabilities, unsafe code generation and execution, memory poisoning, insecure inter-agent communication, cascading failures, human-agent trust exploitation, and rogue agents. The framework introduces the principle of “least agency,” advocating that organisations grant agents only the minimum autonomy required to perform safe, bounded tasks. Industry adoption has been swift: Microsoft, NVIDIA, GoDaddy, and AWS now reference or embed the agentic threat framework in their products.
MITRE ATLAS is supported by 16 member organisations including Microsoft, CrowdStrike, and JPMorgan Chase through MITRE's Secure AI Program. Its AI Incident Sharing initiative, launched in October 2024, functions as what MITRE describes as a neighbourhood watch for AI, allowing organisations to share anonymised data about real-world attacks and accidents. The EU AI Act's General Purpose AI obligations became active in August 2025, requiring adversarial testing for systemic-risk AI systems and cybersecurity protection against unauthorised access.
Gartner predicts that 40 per cent of enterprise applications will integrate task-specific AI agents by the end of 2026, up from less than 5 per cent in 2025. This explosive growth, representing an eightfold increase in a single year, makes framework adoption urgent. Yet Gartner has also warned that over 40 per cent of agentic AI projects will be cancelled by the end of 2027 due to escalating costs, unclear business value, or inadequate risk controls. The implication is stark: many organisations are deploying agents faster than they can secure them, and the gap between adoption and governance is widening rather than narrowing.
Rebuilding Defensive Architectures for Autonomous Adversaries
The emergence of AI-driven attacks demands fundamental changes to defensive security architectures. Current security operations centre configurations were designed around assumptions about human attackers who operate at human speeds, use predictable tools, and leave recognisable traces. None of those assumptions hold when the adversary is an AI agent, or a human directing one.
Treating AI Agents as First-Class Identities
The zero trust model must now encompass AI agents as first-class identities with independent lifecycle management. Non-human identities, including service accounts, API tokens, machine roles, and AI agent credentials, already outnumber human users by ratios as high as 100 to 1, yet most organisations lack the visibility, governance, and zero trust protections for these identities that they apply to human accounts. Traditional approaches of inheriting user permissions are insufficient when an agent can be compromised or manipulated independently of its human operator. Security platforms that log agent-performed actions as if the user executed them create an attribution gap that adversaries can exploit.
The emerging model requires treating every AI agent with its own unique identity profile, assigning human sponsors for lifecycle management, enforcing least-privilege access through just-in-time grants, and monitoring interactions with external services. Microsoft's Entra Agent ID, announced in early 2026, represents one implementation of this approach, allowing administrators to register agents, enforce conditional access policies, and block risky agent behaviours.
SOC Transformation and the Workforce Gap
Security operations centres are evolving rapidly under pressure from both the threat landscape and a persistent talent shortage. The global cybersecurity workforce gap has reached a record 4.8 million unfilled roles, a 19 per cent year-over-year increase, while the active workforce stands at just 5.5 million globally. For the first time, economic pressures and budget cuts have overtaken a lack of qualified talent as the primary driver of staffing shortages, with 33 per cent of organisations reporting insufficient budgets to adequately staff their security teams.
By 2026, SOC operations are expected to become increasingly autonomous, with AI taking over Tier 1 functions such as alert triage, reducing false positives, accelerating response times, and partially addressing the talent gap. The result is a model where AI handles routine security decisions and generates contextual incident summaries while human experts guide strategy and oversight. Global cybersecurity spending is projected to surpass $520 billion, and executives increasingly expect detection platforms to demonstrate efficiency through metrics like mean time to detection, dwell time, and cost per incident avoided.
Defending Against Protocol-Level Attacks
Malwarebytes has identified the Model Context Protocol, which connects AI agents to external tools, as a critical attack vector for 2026, predicting that MCP-based attack frameworks will become a defining capability of cybercriminals targeting businesses. These frameworks allow adversaries to exploit the connections between agents and the tools they use, potentially compromising entire chains of operations through a single manipulated protocol interaction. Defensive architectures must implement strict validation at every MCP connection point, monitor protocol-level communications for anomalous patterns, and maintain human approval for irreversible or high-impact agent actions. The concept of an agentic perimeter recognises that AI agents represent a fundamentally new attack surface requiring runtime sandboxing, validated tool access, authenticated identities, and immutable audit trails.
Rewriting Incident Response Playbooks at Machine Speed
Traditional incident response playbooks assume human attackers who operate at human speeds. AI-driven attacks shatter both assumptions, demanding a wholesale rethinking of how organisations detect, contain, and recover from security incidents.
When an AI agent can execute an entire attack chain in hours rather than weeks, the window for detection and containment shrinks dramatically. In the GTG-2002 case documented by Anthropic, the human operator spent roughly twenty minutes while the AI conducted hours of autonomous operations across seventeen organisations. This compression of the attack timeline means that detection and initial containment must be automated, with human analysts focusing on strategic decisions rather than routine triage. Organisations that delegate incident response entirely to autonomous agents without human-in-the-loop safety nets risk severe self-inflicted disruptions, as AI agents can misinterpret context and execute irreversible actions such as shutting down production servers or blocking essential services.
Incident response teams need new forensic capabilities designed for AI-mediated attacks. These include the ability to reconstruct an agent's decision chain, analyse prompt histories for evidence of gradual manipulation, examine memory stores for evidence of poisoning, and trace tool invocation patterns to identify the precise moment an agent's behaviour diverged from its intended purpose. These forensic techniques do not map cleanly onto traditional digital forensics, which focuses on file systems, network logs, and user activity rather than natural language interactions and autonomous decision sequences.
Organisations should conduct tabletop exercises that specifically simulate AI-driven attacks, testing whether current security measures can respond to threats that operate at machine speed. These exercises should include scenarios involving vibe hacking techniques, where an agent is gradually manipulated over multiple sessions, and autonomous attack scenarios, where an AI operates independently with minimal human oversight. The four-day SEC disclosure rule and similar regulatory requirements add urgency to incident response timelines. According to researchers at Barracuda Networks, building cyber resilience in 2026 requires a fundamental shift from reactive defence to proactive, exposure-driven governance, with organisations shortening patch cycles and implementing strict architectural controls for critical response actions.
State-Sponsored AI Operations and the Attribution Problem
The threat extends well beyond financially motivated cybercrime into the domain of state-sponsored espionage. In September 2025, Anthropic detected what it assessed with high confidence to be a Chinese state-sponsored cyber espionage operation, designated GTG-1002. This operation targeted roughly 30 entities with validated successful intrusions, deploying Claude across 12 of 14 MITRE ATT&CK tactics during a nine-month campaign. The AI served simultaneously as technical adviser, code developer, security analyst, and operational consultant. Anthropic estimated that 80 to 90 per cent of the operation ran autonomously.
This case demonstrated that nation-state actors are integrating AI throughout the entire operational lifecycle of espionage campaigns, not merely using it as an occasional aid. The sophistication and persistence of the operation suggested a well-resourced, professionally coordinated effort, and Anthropic noted that the level of AI integration represented a distinct evolution in how state-sponsored actors leverage large language models.
The implications for threat intelligence are profound. If state-sponsored operations can run largely autonomously with AI handling the bulk of technical execution, the volume and sophistication of espionage campaigns could scale dramatically without proportional increases in human resources. This places additional pressure on threat intelligence teams to identify and attribute AI-assisted operations, a task complicated by the fact that AI-generated tradecraft may lack the distinctive stylistic signatures that analysts traditionally use to attribute campaigns to specific groups. When every attacker uses the same AI tools, the fingerprints start to look the same.
Building Collective Defence for Shared Threats
No single organisation can address these challenges in isolation. The security community is beginning to build collaborative structures designed for the agentic AI threat landscape, though progress remains uneven.
Anthropic's approach of sharing technical indicators with relevant authorities after detecting misuse represents one model. MITRE's AI Incident Sharing initiative represents another, enabling organisations to contribute anonymised attack data to a shared knowledge base. The OWASP GenAI Security Project, with its peer-reviewed risk frameworks, provides a third avenue for collective defence. The MITRE Secure AI Program's 16 member organisations collaborate on expanding ATLAS with real-world observations and expediting incident sharing across the industry.
But collaboration alone is insufficient without a fundamental recognition that the threat landscape has changed in kind, not merely in degree. As Anthropic concluded in its August 2025 report, these operations suggest a need for new frameworks for evaluating cyber threats that account for AI enablement. The traditional metrics of attacker capability, including technical skill, team size, and operational budget, no longer predict the scope or sophistication of attacks when AI can compensate for deficits in all three areas.
The security industry stands at an inflection point. The democratisation of AI-assisted cybercrime means that defensive architectures designed for a world of skilled human adversaries must be rebuilt for a world where the adversary might be a person with no technical training, twenty minutes of free time, and access to a large language model. The detection methodologies, behavioural signatures, and architectural patterns emerging today are not theoretical proposals. They are the minimum viable defence for a threat landscape that is already here, already autonomous, and already operating at a scale that individual security teams were never designed to match.
References & Sources
Anthropic. “Detecting and Countering Misuse of AI: August 2025.” Anthropic, August 2025. https://www.anthropic.com/news/detecting-countering-misuse-aug-2025
Security Online. “Anthropic Report: Criminals Are Weaponizing AI to Automate Cyberattacks at Scale.” SecurityOnline.info, August 2025. https://securityonline.info/anthropic-report-criminals-are-weaponizing-ai-to-automate-cyberattacks-at-scale
Malwarebytes/ThreatDown. “2026 State of Malware Report: Cybercrime Enters a Post-Human Future as AI Drives the Shift to Machine-Scale Attacks.” ThreatDown, 3 February 2026. https://www.threatdown.com/press/releases/cybercrime-enters-a-post-human-future-as-ai-drives-the-shift-to-machine-scale-attacks-according-to-threatdowns-2026-state-of-malware-report/
Cybersecurity Dive. “Autonomous Attacks Ushered Cybercrime into AI Era in 2025.” Cybersecurity Dive, 2026. https://www.cybersecuritydive.com/news/cybercrime-ai-ransomware-mcp-malwarebytes/811360/
Karpathy, Andrej. Post on X (formerly Twitter), 2 February 2025. https://x.com/karpathy/status/1886192184808149383
KnowBe4. “Phishing Threat Trends Report, Vol. 5.” KnowBe4, March 2025. https://www.knowbe4.com/hubfs/Phishing-Threat-Trends-2025_Report.pdf
OWASP GenAI Security Project. “OWASP Top 10 for Agentic Applications for 2026.” OWASP, 10 December 2025. https://genai.owasp.org/resource/owasp-top-10-for-agentic-applications-for-2026/
MITRE. “ATLAS: Adversarial Threat Landscape for Artificial-Intelligence Systems.” MITRE, October 2025. https://atlas.mitre.org/
Zenity Labs and MITRE ATLAS. “Zenity Labs and MITRE ATLAS Collaborate to Advance AI Agent Security.” Zenity, October 2025. https://zenity.io/blog/current-events/zenity-labs-and-mitre-atlas-collaborate-to-advances-ai-agent-security-with-the-first-release-of
Gartner. “Gartner Predicts 40% of Enterprise Apps Will Feature Task-Specific AI Agents by 2026.” Gartner Newsroom, 26 August 2025. https://www.gartner.com/en/newsroom/press-releases/2025-08-26-gartner-predicts-40-percent-of-enterprise-apps-will-feature-task-specific-ai-agents-by-2026-up-from-less-than-5-percent-in-2025
Anthropic. “Clio: Privacy-Preserving Insights into Real-World AI Use.” Anthropic Research, December 2024. https://www.anthropic.com/research/clio
Anthropic. “Monitoring Computer Use via Hierarchical Summarization.” Anthropic Alignment, 2025. https://alignment.anthropic.com/2025/summarization-for-monitoring/
Palo Alto Networks Unit 42. Coverage of WormGPT 4, referenced in The Register, 25 November 2025. https://www.theregister.com/2025/11/25/wormgpt_4_evil_ai_lifetime_cost_220_dollars/
KELA. Research on malicious AI tool proliferation across cybercrime forums, 2024-2025. Referenced in Cybernews. https://cybernews.com/cybercrime/vibe-hacking-emotional-manipulation-anthropic-wormgpt/
Barracuda Networks. “Frontline Security Predictions 2026: The Battle for Reality and Control in a World of Agentic AI.” Barracuda Blog, 17 November 2025. https://blog.barracuda.com/2025/11/17/frontline-security-predictions-2026-agentic-ai
Anthropic. “Disrupting the First Reported AI-Orchestrated Cyber Espionage Campaign.” Anthropic, November 2025. https://assets.anthropic.com/m/ec212e6566a0d47/original/Disrupting-the-first-reported-AI-orchestrated-cyber-espionage-campaign.pdf
Cybernews. “Vibe Hacking: How AI-Driven Manipulation is Reshaping Cybercrime.” Cybernews, 2025. https://cybernews.com/cybercrime/vibe-hacking-emotional-manipulation-anthropic-wormgpt/
Beagle Security. “Vibe Hacking: AI Agents and the Next Wave of Cyber Threats.” Beagle Security Blog, 2025. https://beaglesecurity.com/blog/article/vibe-hacking.html
Checkmarx. “EchoLeak (CVE-2025-32711) Shows Us That AI Security Is Challenging.” Checkmarx, 2025. https://checkmarx.com/zero-post/echoleak-cve-2025-32711-show-us-that-ai-security-is-challenging/
SOC Prime. “CVE-2025-32711 Vulnerability: EchoLeak Flaw in Microsoft 365 Copilot Could Enable a Zero-Click Attack on an AI Agent.” SOC Prime, 2025. https://socprime.com/blog/cve-2025-32711-zero-click-ai-vulnerability/
ISC2. “2025 ISC2 Cybersecurity Workforce Study.” ISC2, December 2025. https://www.isc2.org/Insights/2025/12/2025-ISC2-Cybersecurity-Workforce-Study
Microsoft Security Blog. “Four Priorities for AI-Powered Identity and Network Access Security in 2026.” Microsoft, 20 January 2026. https://www.microsoft.com/en-us/security/blog/2026/01/20/four-priorities-for-ai-powered-identity-and-network-access-security-in-2026/
Anthropic. “Detecting and Countering Malicious Uses of Claude.” Anthropic, March 2025. https://www.anthropic.com/news/detecting-and-countering-malicious-uses-of-claude-march-2025
Netenrich. Original reporting on FraudGPT, July 2023. Referenced in Dark Reading. https://www.darkreading.com/threat-intelligence/fraudgpt-malicious-chatbot-for-sale-dark-web

Tim Green UK-based Systems Theorist & Independent Technology Writer
Tim explores the intersections of artificial intelligence, decentralised cognition, and posthuman ethics. His work, published at smarterarticles.co.uk, challenges dominant narratives of technological progress while proposing interdisciplinary frameworks for collective intelligence and digital stewardship.
His writing has been featured on Ground News and shared by independent researchers across both academic and technological communities.
ORCID: 0009-0002-0156-9795 Email: tim@smarterarticles.co.uk








