Your Bed Needs the Internet: How Cloud Dependence Endangers Everything

At 2:49 AM Eastern Time on 20 October 2025, a DNS race condition inside Amazon Web Services' US-EAST-1 region triggered a cascade that would, over the next fifteen hours, ripple across seventy-five AWS services and knock more than 3,500 companies offline in over sixty countries. Snapchat vanished. Fortnite went dark. Banking applications froze mid-transaction. And in bedrooms across the United States, owners of Eight Sleep's Pod 5 Ultra smart beds discovered that their $5,049 mattresses had locked themselves into upright positions or cranked their heating coils to uncomfortable temperatures, with absolutely no way to override the settings. The app that controlled their beds needed a server farm in Northern Virginia to function. Without it, people were quite literally trapped in furniture that had forgotten how to be furniture.
It was absurd. It was also a warning.
We have built a civilisation that routes an extraordinary share of its daily operations through a vanishingly small number of cloud data centres. Your payment terminal, your city's traffic management system, your hospital's patient records, your child's baby monitor, and yes, your bed, all phone home to the same handful of server clusters operated by Amazon, Microsoft, and Google. When those clusters stumble, the consequences are no longer limited to a slow-loading webpage. They cascade through supply chains, public services, financial markets, and the physical objects in your home. The question is no longer whether a single cloud failure could trigger a societal crisis. The question is how close we have already come.
The Anatomy of a Cascade
To understand the fragility, you first need to understand the architecture. AWS, Microsoft Azure, and Google Cloud collectively control more than 62 per cent of the global cloud market. In Europe and the United Kingdom, AWS and Microsoft alone command roughly 70 per cent. An estimated 94 per cent of enterprise services worldwide depend on at least one of these three providers. This is not a distributed system in any meaningful sense. It is a funnel, and nearly everything flows through it.
The October 2025 AWS outage demonstrated this with uncomfortable clarity. The failure started in DynamoDB's internal management system, where a subtle DNS race condition caused resolution failures. Within hours, the damage had spread to EC2 compute instances, Lambda serverless functions, S3 storage, and RDS databases. But the truly unsettling revelation was structural: US-EAST-1 serves as the control plane for AWS infrastructure globally. Organisations that had carefully architected their applications to run in European or Asian regions discovered that their supposedly distributed infrastructure still depended on a control plane sitting in Northern Virginia. A single region, a single DNS glitch, and the scaffolding beneath a significant portion of the global internet buckled.
The financial toll of such events is staggering and accelerating. The average cost of downtime across industries rose to $8,600 per minute in 2025, up from $5,600 in 2022. Large enterprises averaged $23,750 per minute of disruption. Across the Global 2000, IT outages collectively drain an estimated four hundred billion dollars annually. The October AWS outage alone generated more than four million outage reports within its first two hours, a measure not just of technical impact but of how many distinct services, businesses, and individuals had routed their operations through a single provider's infrastructure.
But the monetary figures, enormous as they are, obscure the deeper problem. Money can be recovered. Trust in infrastructure is harder to rebuild, particularly when people discover that the systems governing their physical safety have a dependency chain that terminates at a single server rack in a single building in a single state.
When Your Bed Becomes a Liability
The Eight Sleep incident during the October 2025 outage became something of an internet parable, the kind of story that makes you laugh until you think about it for more than thirty seconds. Eight Sleep's Pod 5 Ultra uses water-cooled coils managed through a cloud-connected application. The system tracks biometric data, adjusts temperatures throughout the night, and positions the adjustable base to reduce snoring. It is, by any measure, an impressive piece of engineering. It is also, as October 2025 revealed, entirely dependent on servers it does not own and cannot control.
When AWS went down, users lost access to the temperature controls entirely. Reports flooded social media: beds locked at dangerously high temperatures, adjustable bases stuck in elevated positions, alarms silenced. One user described the experience as sleeping in a sauna. Matteo Franceschetti, the company's chief executive, apologised publicly and promised to “outage-proof” the technology. Within twenty-four hours, Eight Sleep shipped an emergency “outage mode” using Bluetooth connectivity, allowing basic local control without an internet connection.
The speed of the fix only deepened the question: why had local control not been the default from the start? The answer, of course, is economic. Cloud connectivity enables continuous data collection, biometric tracking, firmware updates, and subscription revenue models. Local processing is less profitable. It does not generate the steady stream of user data that feeds product development and investor presentations. The result is a consumer landscape where the most intimate objects in your home require permission from a distant server to perform their basic functions. Your bed needs the internet to be a bed. Your lights need the cloud to switch on.
Eight Sleep was not an isolated case. The Sengled smart lighting outage from 18 to 22 June 2025 left thousands of households without control of their bulbs for four solid days. Not dimming. Not colour adjustment. Simply on or off, and even that was unreliable. German networking company Devolo announced the complete shutdown of its Home Control smart home platform's cloud servers on 31 December 2025, rendering its entire product ecosystem effectively non-functional from that date forward. Users could no longer change configurations, add devices, or access the system through any interface. Integration with Amazon Alexa and Google Assistant ceased entirely. Devices that people had purchased, installed, and relied upon became inert plastic overnight, not because they were broken but because a company decided to stop running a server.
These are not edge cases. They are the predictable outcome of an industry that has systematically traded user autonomy for recurring revenue. Gartner projects that by the end of 2026, 34 per cent of device operations will be processed locally rather than in the cloud, up from 15 per cent the year prior. The Matter protocol and local AI hubs represent early attempts to build smart home infrastructure that does not require a round trip to Oregon to turn on a kitchen light. That trajectory acknowledges the problem, but hardly solves it. Two-thirds of your smart home still needs to call home. And for the devices that fail when the cloud fails, the consequence is not just inconvenience. It is a fundamental breach of the implicit contract between consumer and product: the thing you bought should do the thing it was sold to do.
The Day the Internet Lost Its Middle
If the AWS outage of October 2025 demonstrated the risks of cloud concentration at the infrastructure layer, the Cloudflare outage of 18 November 2025 exposed a different but equally troubling dependency. Cloudflare handles roughly 20 per cent of global web traffic, serving as the content delivery network and security layer for millions of websites and applications. At 11:20 UTC, a change to the permissions of one of Cloudflare's database systems caused the database to output multiple entries into a configuration file used by its Bot Management system. The file doubled in size and was propagated across the entire network.
The result was immediate and vast. X (formerly Twitter), ChatGPT, OpenAI's suite of tools, Spotify, Discord, Zoom, Canva, Uber, and League of Legends all went down or experienced severe degradation. Public transit systems, e-commerce platforms, and banking interfaces were disrupted. The outage lasted approximately five to six hours before full recovery at 17:06 UTC. Crucially, this was not a cyberattack. It was a configuration change, the kind of routine database maintenance that happens thousands of times per day across the industry. A single permissions error in a single database cascaded into the disruption of services used by hundreds of millions of people.
What made the Cloudflare outage particularly revealing was the nature of the services it knocked offline. ChatGPT and Claude AI both experienced disruptions, meaning the AI assistants that an increasing number of professionals, students, and businesses rely upon for daily work simply stopped responding. The episode illustrated a dependency chain that most users had never considered: your AI assistant depends on a cloud provider, which depends on a content delivery network, which depends on a database permission being set correctly. Remove any link in that chain and the entire service collapses. The layers of abstraction that make modern technology convenient also make it opaque, and opacity breeds fragility.
Payments in Freefall
If a malfunctioning bed is an inconvenience and a broken AI chatbot is a productivity hit, a frozen payment system is something closer to a crisis. During the CrowdStrike incident of 19 July 2024, a faulty content update to the company's Falcon Sensor software caused 8.5 million Windows computers to crash simultaneously. The damage reached across airlines, hospitals, banks, and payment processors. Visa and Mastercard transaction systems experienced disruptions in some regions. The financial losses exceeded ten billion dollars globally, with Fortune 500 companies alone absorbing more than five billion dollars in direct costs. Insurers estimated payouts of approximately 1.5 billion dollars.
The CrowdStrike failure was not a cloud outage in the traditional sense. It was a security software update that exposed a different kind of single point of failure: the monoculture of endpoint protection. CrowdStrike held roughly 18 per cent of the global market, which meant that a bug in one company's update pipeline could, and did, bring commercial aviation and hospital systems to a halt on the same morning. The Library of Congress documented the impacts to public safety systems, noting that the outage affected emergency services infrastructure across multiple jurisdictions.
Payment infrastructure sits at the intersection of all these vulnerabilities. According to a 2024 survey, 76 per cent of global respondents run applications on AWS, and the service powers more than 90 per cent of Fortune 100 companies. When AWS experienced its October 2025 outage, UK banks were among those knocked offline, joining a long list of financial institutions that discovered their resilience planning had not accounted for the failure of a service they had treated as permanently available. The dependency is not hypothetical. It is operational, structural, and deeply embedded in the architecture of modern commerce.
Nordic countries and Estonia have begun exploring offline card-payment backup systems, a recognition that payment resilience must be designed deliberately rather than assumed. The Ponemon Institute's 2024 Cost of Data Center Outages report found that the average cost of a data centre outage is approximately $9,000 per minute, with financial services among the most severely affected sectors. The EU's Digital Operational Resilience Act (DORA), which came into effect in January 2025, now requires banks, insurers, and investment firms to prove that their digital operations are resilient, auditable, and accessible to regulators. DORA specifically addresses third-party ICT risk, requiring financial entities to identify and manage dependencies on critical technology providers. It is a start, but regulation follows disaster more often than it prevents it.
Hospitals Without Memory
The healthcare sector presents perhaps the most concerning frontier of cloud and AI dependency. In 2024, 71 per cent of non-federal acute care hospitals in the United States reported using predictive AI integrated into their electronic health records, up from approximately 66 per cent in 2023. Among hospitals affiliated with multi-hospital systems, adoption reached 86 per cent. These systems handle billing, scheduling, risk stratification, and increasingly, clinical decision support. They are not optional add-ons. They have become load-bearing elements of hospital operations.
The CrowdStrike outage of July 2024 provided a stark preview of what happens when those systems fail. A study published through the National Center for Biotechnology Information documented that 759 US hospitals experienced network disruptions during the incident. Of the nearly 1,100 internet-based services examined across those hospitals, 239 (21.8 per cent) were characterised as corresponding with direct patient care functionality. These were not administrative inconveniences. They were disruptions to systems that clinicians relied upon to access patient records, manage medications, and coordinate care.
The paradox of healthcare AI adoption is that it simultaneously improves operational efficiency and increases systemic vulnerability. Hospitals with mature predictive AI deployments have realised measurable improvements in billing accuracy, scheduling efficiency, and outpatient risk stratification. But those gains come with a dependency: if the cloud infrastructure supporting those AI systems fails, the hospital does not revert to a slightly less efficient version of itself. It reverts to paper processes that many current staff have never been trained to use. The institutional memory of how to operate without digital infrastructure is eroding precisely as the digital infrastructure becomes less reliable.
Rural and independent hospitals face a different version of this problem. With only 37 per cent adoption of predictive AI compared to 86 per cent at system-affiliated facilities, they are less exposed to AI-specific failures but also less equipped to absorb any technology-driven disruption. The digital divide in healthcare creates a fragmented landscape where a single infrastructure failure affects institutions unevenly, complicating coordinated emergency responses.
The Grid That Forgot How to Balance
On 28 April 2025, at 12:33 Central European Summer Time, the power systems of continental Spain and Portugal experienced a total blackout. In just five seconds, Spain lost approximately fifteen gigawatts of capacity, equivalent to 60 per cent of its national electricity demand. The remaining generation was insufficient to meet load, and the grid entered a cascading failure that left 31 gigawatts of demand disconnected. Power was interrupted for about ten hours across most of the Iberian Peninsula, and considerably longer in some areas.
The European Network of Transmission System Operators for Electricity (ENTSO-E) published its final expert panel report in March 2026, identifying a combination of interacting factors: oscillations, gaps in voltage and reactive power control, rapid output reductions, generator disconnections, and uneven stabilisation capabilities. The interaction between market design (schedule changes driving fast power ramps), grid code implementation (fixed power factor mode for renewables), protection coordination (settings diverging from requirements), and system architecture (insufficient reactive power reserves and manual switching of critical assets) created cascading failures that unfolded faster than human operators could respond. The Iberian system simply lacked the inertia needed to absorb the initial generation-loss shocks, and automatic protection mechanisms cascaded into a total system collapse.
The Iberian blackout was not caused by artificial intelligence. But it was a product of the same underlying dynamic that makes AI-dependent infrastructure so fragile: systems designed for efficiency and automation that lacked the inertia, redundancy, and human-override capability to absorb unexpected shocks. As more grid management functions migrate to AI-driven optimisation platforms hosted in cloud environments, the attack surface and failure surface both expand. A power-grid optimiser that destabilises supply, whether through a software bug, a corrupted model, or a compromised update, could reproduce the Iberian scenario in a system that has even less manual fallback capability than the one that failed in April 2025.
In August 2025, a corrupted update in a widely used open-source AI optimisation library triggered cascading failures lasting over twelve hours. Healthcare, finance, aviation, and emergency response systems were among those affected. The incident demonstrated that AI infrastructure does not need to be sophisticated to be dangerous. It merely needs to be ubiquitous. And ubiquity, in a world where a handful of open-source libraries and cloud providers underpin the majority of AI deployments, is exactly what we have achieved.
Concentration as Systemic Risk
The United Kingdom offers a particularly instructive case study in cloud concentration risk. The UK's cloud market is dominated by AWS and Microsoft, and the UK government's “One Government Value Agreement” with AWS led to a tenfold increase in spending on the platform, from approximately 100 million pounds to over one billion pounds since the agreement was originally signed in 2020. The National Preparedness Commission has argued that cloud dominance in the UK demands immediate Competition and Markets Authority action, framing concentration not as a market efficiency issue but as a national security concern.
The concern is not merely theoretical. During the October 2025 AWS outage, government and major banking services in the UK reported intermittent issues, demonstrating that when public infrastructure depends on a handful of cloud providers and regions, outages can compromise access to essential services. Microsoft's suspected suspension in May 2025 of the International Criminal Court Chief Prosecutor's email services further illustrated how dependence on a single US-based provider can create political vulnerabilities that extend well beyond technical uptime. When a foreign government's diplomatic communications or a court's prosecutorial operations can be disrupted by a single company's decision, the sovereignty implications are impossible to ignore.
Across Europe, American technology companies control more than 70 per cent of cloud infrastructure. The EU has responded with a series of regulatory instruments that collectively represent the most ambitious attempt yet to address digital dependency. The Data Act, which entered into force in January 2024, contains provisions to prevent vendor lock-in, requiring cloud providers to remove unjustified technical or contractual barriers to switching and mandating the phasing out of switching fees by 2026 to 2027. The Network and Information Security Directive 2 (NIS2), effective from October 2024, addresses cybersecurity and operational resilience for essential services, imposing requirements around risk management, supply chain security, and breach reporting. The European Commission is preparing to propose a Cloud and AI Development Act, expected in the first quarter of 2026, which may set standards for cloud computing services and promote investment in European data centres.
In the United Kingdom, the Cyber Security and Resilience Bill, introduced to the House of Commons on 12 November 2025, updates previous cybersecurity legislation and is expected to become law in 2026. These regulatory efforts acknowledge the problem. Whether they solve it depends entirely on enforcement and on whether governments are willing to accept the short-term costs of diversifying away from established providers. The track record is not encouraging. Convenience and cost savings have consistently won against resilience in procurement decisions, and the structural incentives that created cloud concentration remain firmly in place.
The Illusion of Redundancy
One of the more uncomfortable lessons of recent outages is how thoroughly they have dismantled the promise of multi-region resilience. Cloud providers market regional redundancy as a safeguard: distribute your workloads across multiple regions, and no single failure can bring you down. The October 2025 AWS outage revealed this as, at best, a half-truth. Because US-EAST-1 serves as the global control plane, organisations running workloads in Frankfurt or Singapore still found themselves dependent on infrastructure in Northern Virginia. The redundancy was architectural fiction, a comforting story that collapsed the moment the underlying assumption was tested.
This pattern recurs with dispiriting regularity. Google Cloud's Identity and Access Management outage on 12 June 2025 lasted only one hour and thirteen minutes, but because IAM is the gateway through which every Google Cloud product authenticates, the single failure immediately disabled a wide range of services worldwide. Azure's Front Door outage in October 2025 similarly cascaded: the failure of a global content delivery and application delivery service knocked out Microsoft 365, the Azure Portal, and enterprise customers including Alaska Airlines for more than eight hours. In late July 2025, Azure's East US region experienced allocation failures that Microsoft reported resolved by 5 August, though some users continued to report problems days afterwards.
Azure outages were the longest on average in the 2024-2025 period, lasting a mean of 14.6 hours per incident. Google Cloud disruptions averaged 5.8 hours. AWS incidents averaged 1.5 hours but, as the October 2025 event demonstrated, their impact was amplified by the sheer volume of services depending on a single region. Between August 2024 and August 2025, the three major providers together experienced more than 100 service outages. Global network outages increased by 33 per cent from January to May 2025, rising from 1,382 to 1,843 incidents. Critical cloud outages increased by approximately 18 per cent in 2024. The trend line is not ambiguous. The systems are getting more complex, more interdependent, and more prone to cascading failure.
Smart city infrastructure amplifies these risks further. Traffic management, water systems, power distribution, and emergency services increasingly depend on cloud-hosted platforms for coordination and optimisation. When those platforms are locked into a single provider's ecosystem, the city inherits all of that provider's failure modes. Individual subsystems such as traffic control, power distribution, and water management should, in principle, operate independently while coordinating through shared data layers. In practice, vendor-led platforms have created long-term lock-in, and fragmented governance has slowed the development of cross-domain interoperability. The networked nature of smart city technologies, with a growing number of interconnected sensors, cameras, and control nodes, also expands the attack surface. A failure in one subsystem can propagate to others through shared dependencies that were invisible during normal operations.
Building for the Failure You Cannot Predict
The instinct, after each major outage, is to call for better engineering. More robust DNS configurations. Improved testing protocols. Redundant control planes. These are necessary but insufficient responses. The deeper challenge is structural. We have allowed a civilisation-scale dependency to concentrate in a handful of private companies whose primary obligation is to shareholders, not to the public infrastructure that has come to rely on them.
The remedies being discussed span a wide range of ambition and feasibility. At the consumer level, the push towards local processing and edge computing represents a partial answer. At the enterprise level, genuine multi-cloud strategies (not merely multi-region deployments within a single provider) could distribute risk more effectively, though they carry significant cost and complexity overhead. At the regulatory level, the EU's Digital Operational Resilience Act and the UK's forthcoming Cyber Security and Resilience legislation represent early attempts to mandate resilience rather than merely recommend it. Experts have recommended mandating AI system redundancy for critical infrastructure, funding global audits of open-source AI dependencies, and establishing an international AI incident response coalition.
But the most fundamental shift required is conceptual. We need to stop treating cloud infrastructure as a utility that simply works and start treating it as what it is: a concentrated, commercially operated system with known failure modes and structural single points of failure. Traditional utilities have regulators, redundancy requirements, and public service obligations. Cloud providers, despite hosting infrastructure that is arguably more critical to daily life than the electricity grid was fifty years ago, operate under a fraction of that oversight. The gap between the criticality of these services and the regulatory framework governing them is one of the defining mismatches of our era.
The Iberian blackout of April 2025 unfolded faster than human operators could respond. The AWS outage of October 2025 revealed that global control planes create global vulnerabilities. The CrowdStrike failure of July 2024 showed that software monocultures can cascade across every sector simultaneously. The Cloudflare outage of November 2025 demonstrated that a single configuration error in a content delivery network can simultaneously disable AI assistants, social media platforms, ride-hailing services, and public transit systems. Each incident was different in its specifics but identical in its lesson: systems optimised for efficiency at the expense of redundancy will eventually fail, and when they do, the failure will propagate along every dependency chain that was invisible during normal operations.
The question posed at the outset, whether we have built a civilisation so dependent on AI infrastructure that one failure could cascade into a societal crisis, has an answer that is both reassuring and deeply troubling. We have not yet experienced a full-scale societal collapse from a cloud or AI infrastructure failure. But we have experienced dress rehearsals, and each one has been larger, longer, and more consequential than the last. The Eight Sleep bed that overheated in October 2025 is a punchline. The payment system that froze, the hospital that lost access to patient records, the power grid that collapsed in five seconds: those are not punchlines. They are data points on a curve that bends towards a reckoning we have not yet decided to prevent.
The infrastructure we depend on is only as resilient as its weakest dependency. Right now, that dependency is a DNS record in Northern Virginia.
References and Sources
- ThousandEyes, “AWS Outage Analysis: October 20, 2025,” ThousandEyes Blog, October 2025.
- AWS, “Post-Event Summaries,” AWS Premium Support, October 2025.
- UPI, “AWS outage caused smart beds to overheat, get stuck upright,” UPI Odd News, 22 October 2025.
- TechBuzz AI, “Eight Sleep adds offline mode after AWS outage left smart beds stuck,” October 2025.
- CyberInsurance News, “Cloud Outages in 2024 Increased by 18%, Google Cloud Downtime Up 57%,” Parametrix Report, 2024.
- CNN Business, “CrowdStrike outage: We finally know what caused it, and how much it cost,” 24 July 2024.
- Congress.gov, “CrowdStrike IT Outage: Impacts to Public Safety Systems and Considerations for Congress,” Library of Congress, 2024.
- ENTSO-E, “Expert Panel Final Report on 28 April 2025 Blackout in Spain and Portugal,” 20 March 2026.
- Power Magazine, “Anatomy of a Blackout: Findings from the Spain-Portugal Grid Collapse Final Report,” 2026.
- FinTech Magazine, “AWS Outage: A Major Risk For The Financial Sector?” October 2025.
- National Preparedness Commission, “The Concentration Crisis: Why Cloud Dominance in the UK Demands Immediate CMA Action,” 2025.
- The Register, “Europe gets serious about cutting US digital umbilical cord,” 22 December 2025.
- Morrison Foerster, “Digital Regulation in EU and UK: The Enduring 2025 Themes,” January 2026.
- House of Commons Library, “Cyber Security and Resilience (Network and Information Systems) Bill 2024-26,” 2025.
- DemandSage, “54 Internet Outage Statistics (2026): Global Downtime, Impacts,” 2026.
- The Conversation / Iowa State Research, “Why cloud service outages ripple across the internet and the economy,” March 2026.
- Serenity Smart Homes NJ, “What the Sengled Outage Taught Us About Cloud Reliance,” 22 June 2025.
- We Speak IoT, “Devolo Ends Its Home Control Smart Home System: Cloud Services to Shut Down by End of 2025,” 2025.
- Ancher AI, “Global AI Outage of 2025: Causes and Lessons,” 2025.
- New Civil Engineer, “Integrating AI into critical national infrastructure presents severe risks, experts warn,” 1 October 2025.
- Belitsoft, “Outage Affected Multiple Google Cloud Platform Products,” June 2025.
- Cloudflare Blog, “Cloudflare outage on November 18, 2025,” November 2025.
- PMC / NCBI, “Patient Care Technology Disruptions Associated With the CrowdStrike Outage,” National Center for Biotechnology Information, 2025.
- NCBI Bookshelf, “Hospital Trends in the Use, Evaluation, and Governance of Predictive AI, 2023-2024,” ASTP Health IT Data Brief, 2024.
- McKinsey, “AI-native public infrastructure for smart cities,” McKinsey Technology, 2025.

Tim Green UK-based Systems Theorist & Independent Technology Writer
Tim explores the intersections of artificial intelligence, decentralised cognition, and posthuman ethics. His work, published at smarterarticles.co.uk, challenges dominant narratives of technological progress while proposing interdisciplinary frameworks for collective intelligence and digital stewardship.
His writing has been featured on Ground News and shared by independent researchers across both academic and technological communities.
ORCID: 0009-0002-0156-9795 Email: tim@smarterarticles.co.uk