SmarterArticles

DataPrivacy

The algorithm knows you better than you know yourself. It knows you prefer aisle seats on morning flights. It knows you'll pay extra for hotels with rooftop bars. It knows that when you travel to coastal cities, you always book seafood restaurants for your first night. And increasingly, it knows where you're going before you've consciously decided.

Welcome to the age of AI-driven travel personalisation, where artificial intelligence doesn't just respond to your preferences but anticipates them, curates them, and in some uncomfortable ways, shapes them. As generative AI transforms how we plan and experience travel, we're witnessing an unprecedented convergence of convenience and surveillance that raises fundamental questions about privacy, autonomy, and the serendipitous discoveries that once defined the joy of travel.

The Rise of the AI Travel Companion

The transformation has been swift. According to research from Oliver Wyman, 41% of nearly 2,100 consumers from the United States and Canada reported using generative AI tools for travel inspiration or itinerary planning in March 2024, up from 34% in August 2023. Looking forward, 58% of respondents said they are likely to use the technology again for future trips, with that number jumping to 82% among recent generative AI users.

What makes this shift remarkable isn't just the adoption rate but the depth of personalisation these systems now offer. Google's experimental AI-powered itinerary generator creates bespoke travel plans based on user prompts, offering tailored suggestions for flights, hotels, attractions, and dining. Platforms like Mindtrip, Layla.ai, and Wonderplan have emerged as dedicated AI travel assistants, each promising to understand not just what you want but who you are as a traveller.

These platforms represent a qualitative leap from earlier recommendation engines. Traditional systems relied primarily on collaborative filtering or content-based filtering. Modern AI travel assistants employ large language models capable of understanding nuanced requests like “I want somewhere culturally rich but not touristy, with good vegetarian food and within four hours of London by train.” The system doesn't just match keywords; it comprehends context, interprets preferences, and generates novel recommendations.

The business case is compelling. McKinsey research indicates that companies excelling in personalisation achieve 40% more revenue than their competitors, whilst personalised offers can increase customer satisfaction by approximately 20%. Perhaps most tellingly, 76% of customers report frustration when they don't receive personalised interactions. The message to travel companies is clear: personalise or perish.

Major industry players have responded aggressively. Expedia has integrated more than 350 AI models throughout its marketplace, leveraging what the company calls its most valuable asset: 70 petabytes of traveller information stored on AWS cloud. “Data is our heartbeat,” the company stated, and that heartbeat now pulses through every recommendation, every price adjustment, every nudge towards booking.

Booking Holdings has implemented AI to refine dynamic pricing models, whilst Airbnb employs machine learning to analyse past bookings, browsing behaviour, and individual preferences to retarget customers with personalised marketing campaigns. In a significant development, OpenAI launched third-party integrations within ChatGPT allowing users to research and book trips directly through the chatbot using real-time data from Expedia and Booking.com.

The revolution extends beyond booking platforms. According to McKinsey's 2024 survey of more than 5,000 travellers across China, Germany, the UAE, the UK, and the United States, 43% of travellers used AI to book accommodations, search for leisure activities, and look for local transportation. The technology has moved from novelty to necessity, with travel organisations potentially boosting revenue growth by 15-20% if they fully leverage digital and AI analytics opportunities.

McKinsey found that 66% of travellers surveyed said they are more interested in travel now than before the COVID-19 pandemic, with millennials and Gen Z travellers particularly enthusiastic about AI-assisted planning. These younger cohorts are travelling more and spending a higher share of their income on travel than their older counterparts, making them prime targets for AI personalisation strategies.

Yet beneath this veneer of convenience lies a more complex reality. The same algorithms that promise perfect holidays are built on foundations of extensive data extraction, behavioural prediction, and what some scholars have termed “surveillance capitalism” applied to tourism.

The Data Extraction Machine

To deliver personalisation, AI systems require data. Vast quantities of it. And the travel industry has become particularly adept at collection.

Every interaction leaves a trail. When you search for flights, the system logs your departure flexibility, price sensitivity, and willingness to book. When you browse hotels, it tracks how long you linger on each listing, which photographs you zoom in on, which amenities matter enough to filter for. When you book a restaurant, it notes your cuisine preferences, party size, and typical spending range. When you move through your destination, GPS data maps your routes, dwell times, and unplanned diversions.

Tourism companies are now linking multiple data sources to “complete the customer picture”, which may include family situation, food preferences, travel habits, frequently visited destinations, airline and hotel preferences, loyalty programme participation, and seating choices. According to research on smart tourism systems, this encompasses tourists' demographic information, geographic locations, transaction information, biometric information, and both online and real-life behavioural information.

A single traveller's profile might combine booking history from online travel agencies, click-stream data showing browsing patterns, credit card transaction data revealing spending habits, loyalty programme information, social media activity, mobile app usage patterns, location data from smartphone GPS, biometric data from airport security, and even weather preferences inferred from booking patterns across different climates.

This holistic profiling enables unprecedented predictive capabilities. Systems can forecast not just where you're likely to travel next but when, how much you'll spend, which ancillary services you'll purchase, and how likely you are to abandon your booking at various price points. In the language of surveillance capitalism, these become “behavioural futures” that can be sold to advertisers, insurers, and other third parties seeking to profit from predicted actions.

The regulatory landscape attempts to constrain this extraction. The General Data Protection Regulation (GDPR), which entered into full enforcement in 2018, applies to any travel or transportation services provider collecting or processing data about an EU citizen. This includes travel management companies, hotels, airlines, ground transportation services, booking tools, global distribution systems, and companies booking travel for employees.

Under GDPR, as soon as AI involves the use of personal data, the regulation is triggered and applies to such AI processing. The EU framework does not distinguish between private and publicly available data, offering more protection than some other jurisdictions. Implementing privacy by design has become essential, requiring processing as little personal data as possible, keeping it secure, and processing it only where there is a genuine need.

Yet compliance often functions more as a cost of doing business than a genuine limitation. The travel industry has experienced significant data breaches that reveal the vulnerability of collected information. In 2024, Marriott agreed to pay a $52 million settlement in the United States related to the massive Marriott-Starwood breach that affected 383 million guests. The same year, Omni Hotels & Resorts suffered a major cyberattack on 29 March that forced multiple IT systems offline, disrupting reservations, payment processing, and digital room key access.

The MGM Resorts breach in 2023 demonstrated the operational impact beyond data theft, leaving guests stranded in lobbies when digital keys stopped working. When these systems fail, they fail comprehensively.

According to the 2025 Verizon Data Breach Investigations Report, cybercriminals targeting the hospitality sector most often rely on system intrusions, social engineering, and basic web application attacks, with ransomware featuring in 44% of breaches. The average cost of a hospitality data breach has climbed to $4.03 million in 2025, though this figure captures only direct costs and doesn't account for reputational damage or long-term erosion of customer trust.

These breaches aren't merely technical failures. They represent the materialisation of a fundamental privacy risk inherent in the AI personalisation model: the more data systems collect to improve recommendations, the more valuable and vulnerable that data becomes.

The situation is particularly acute for location data. More than 1,000 apps, including Yelp, Foursquare, Google Maps, Uber, and travel-specific platforms, use location tracking services. When users enable location tracking on their phones or in apps, they allow dozens of data-gathering companies to collect detailed geolocation data, which these companies then sell to advertisers.

One of the most common privacy violations is collecting or tracking a user's location without clearly asking for permission. Many users don't realise the implications of granting “always-on” access or may accidentally agree to permissions without full context. Apps often integrate third-party software development kits for analytics or advertising, and if these third parties access location data, users may unknowingly have their information sold or repurposed, especially in regions where privacy laws are less stringent.

The problem extends beyond commercial exploitation. Many apps use data beyond the initial intended use case, and oftentimes location data ends up with data brokers who aggregate and resell it without meaningful user awareness or consent. Information from GPS and geolocation tags, in combination with other personal information, can be utilised by criminals to identify an individual's present or future location, thus facilitating burglary and theft, stalking, kidnapping, and domestic violence. For public figures, journalists, activists, or anyone with reason to conceal their movements, location tracking represents a genuine security threat.

The introduction of biometric data collection at airports adds another dimension to privacy concerns. As of July 2022, U.S. Customs and Border Protection has deployed facial recognition technology at 32 airports for departing travellers and at all airports for arriving international travellers. The Transportation Security Administration has implemented the technology at 16 airports, including major hubs in Atlanta, Boston, Dallas, Denver, Detroit, Los Angeles, and Miami.

Whilst CBP retains U.S. citizen photos for no more than 12 hours after identity verification, the TSA does retain photos of non-US citizens, allowing surveillance of non-citizens. Privacy advocates worry about function creep: biometric data collected for identity verification could be repurposed for broader surveillance.

Facial recognition technology can be less accurate for people with darker skin tones, women, and older adults, raising equity concerns about who is most likely to be wrongly flagged. Notable flaws include biases that often impact people of colour, women, LGBTQ people, and individuals with physical disabilities. These accuracy disparities mean that marginalised groups bear disproportionate burdens of false positives, additional screening, and the indignity of systems that literally cannot see them correctly.

Perhaps most troublingly, biometric data is irreplaceable. If biometric information such as fingerprints or facial recognition details are compromised, they cannot be reset like a password. Stolen biometric data can be used for identity theft, fraud, or other criminal activities. A private airline could sell biometric information to data brokers, who can then sell it to companies or governments.

SITA estimates that 70% of airlines expect to have biometric ID management in place by 2026, whilst 90% of airports are investing in major programmes or research and development in the area. The trajectory is clear: biometric data collection is becoming infrastructure, not innovation. What begins as optional convenience becomes mandatory procedure.

The Autonomy Paradox

The privacy implications are concerning enough, but AI personalisation raises equally profound questions about autonomy and decision-making. When algorithms shape what options we see, what destinations appear attractive, and what experiences seem worth pursuing, who is really making our travel choices?

Research on AI ethics and consumer protection identifies dark patterns as business practices employing elements of digital choice architecture that subvert or impair consumer autonomy, decision-making, or choice. The combination of AI, personal data, and dark patterns results in an increased ability to manipulate consumers.

AI can escalate dark patterns by leveraging its capabilities to learn from patterns and behaviours, personalising appeals specific to user sensitivities to make manipulative tactics seem less invasive. Dark pattern techniques undermine consumer autonomy, leading to financial losses, privacy violations, and reduced trust in digital platforms.

The widespread use of personalised algorithmic decision-making has raised ethical concerns about its impact on user autonomy. Digital platforms can use personalised algorithms to manipulate user choices for economic gain by exploiting cognitive biases, nudging users towards actions that align more with platform owners' interests than users' long-term well-being.

Consider dynamic pricing, a ubiquitous practice in travel booking. Airlines and hotels adjust prices based on demand, but AI-enhanced systems now factor in individual user data: your browsing history, your previous booking patterns, even the device you're using. If the algorithm determines you're price-insensitive or likely to book regardless of cost, you may see higher prices than another user searching for the same flight or room.

This practice, sometimes called “personalised pricing” or more critically “price discrimination”, raises questions about fairness and informed consent. Users rarely know they're seeing prices tailored to extract maximum revenue from their specific profile. The opacity of algorithmic pricing means travellers cannot easily determine whether they're receiving genuine deals or being exploited based on predicted willingness to pay.

The asymmetry of information is stark. The platform knows your entire booking history, your browsing behaviour, your price sensitivity thresholds, your typical response to scarcity messages, and your likelihood of abandoning a booking at various price points. You know none of this about the platform's strategy. This informational imbalance fundamentally distorts what economists call “perfect competition” and transforms booking into a game where only one player can see the board.

According to research, 65% of people see targeted promotions as a top reason to make a purchase, suggesting these tactics effectively influence behaviour. Scarcity messaging offers a particularly revealing example. “Three people are looking at this property” or “Price increased £20 since you last viewed” creates urgency that may or may not reflect reality. When these messages are personalised based on your susceptibility to urgency tactics, they cross from information provision into manipulation.

The possibility of behavioural manipulation calls for policies that ensure human autonomy and self-determination in any interaction between humans and AI systems. Yet regulatory frameworks struggle to keep pace with technological sophistication.

The European Union has attempted to address these concerns through the AI Act, which was published in the Official Journal on 12 July 2024 and entered into force on 1 August 2024. The Act introduces a risk-based regulatory framework for AI, mandating obligations for developers and providers according to the level of risk associated with each AI system.

Whilst the tourism industry is not explicitly called out as high-risk, the use of AI systems for tasks such as personalised travel recommendations based on behaviour analysis, sentiment analysis in social media, or facial recognition for security will likely be classified as high-risk. For use of prohibited AI systems, fines may be up to 7% of worldwide annual turnover, whilst noncompliance with requirements for high-risk AI systems will be subject to fines of up to 3% of turnover.

However, use of smart travel assistants, personalised incentives for loyalty scheme members, and solutions to mitigate disruptions will all be classified as low or limited risk under the EU AI Act. Companies using AI in these ways will have to adhere to transparency standards, but face less stringent regulation.

Transparency itself has become a watchword in discussions of AI ethics. The call is for transparent, explainable AI where users can comprehend how decisions affecting their travel are made. Tourists should know how their data is being collected and used, and AI systems should be designed to mitigate bias and make fair decisions.

Yet transparency alone may not suffice. Even when privacy policies disclose data practices, they're typically lengthy, technical documents that few users read or fully understand. According to an Apex report, a significant two-thirds of consumers worry about their data being misused. However, 62% of consumers might share more personal data if there's a discernible advantage, like tailored offers.

But is this exchange truly voluntary when the alternative is a degraded user experience or being excluded from the most convenient booking platforms? When 71% of consumers expect personalised experiences and 76% feel frustrated without them, according to McKinsey research, has personalisation become less a choice and more a condition of participation in modern travel?

The question of voluntariness deserves scrutiny. Consent frameworks assume roughly equal bargaining power and genuine alternatives. But when a handful of platforms dominate travel booking, when personalisation becomes the default and opting out requires technical sophistication most users lack, when privacy-protective alternatives don't exist or charge premium prices, can we meaningfully say users “choose” surveillance?

The Death of Serendipity

Beyond privacy and autonomy lies perhaps the most culturally significant impact of AI personalisation: the potential death of serendipity, the loss of unexpected discovery that has historically been central to the transformative power of travel.

Recommender systems often suffer from feedback loop phenomena, leading to the filter bubble effect that reinforces homogeneous content and reduces user satisfaction. Over-relying on AI for destination recommendations can create a situation where suggestions become too focused on past preferences, limiting exposure to new and unexpected experiences.

The algorithm optimises for predicted satisfaction based on historical data. If you've previously enjoyed beach holidays, it will recommend more beach holidays. If you favour Italian cuisine, it will surface Italian restaurants. This creates a self-reinforcing cycle where your preferences become narrower and more defined with each interaction.

But travel has traditionally been valuable precisely because it disrupts our patterns. The wrong turn that leads to a hidden plaza. The restaurant recommended by a stranger that becomes a highlight of your trip. The museum you only visited because it was raining and you needed shelter. These moments of serendipity cannot be algorithmically predicted because they emerge from chance, context, and openness to the unplanned.

Research on algorithmic serendipity explores whether AI-driven systems can introduce unexpected yet relevant content, breaking predictable patterns to encourage exploration and discovery. Large language models have shown potential in serendipity prediction due to their extensive world knowledge and reasoning capabilities.

A framework called SERAL was developed to address this challenge, and online experiments demonstrate improvements in exposure, clicks, and transactions of serendipitous items. It has been fully deployed in the “Guess What You Like” section of the Taobao App homepage. Context-aware algorithms factor in location, preferences, and even social dynamics to craft itineraries that are both personalised and serendipitous.

Yet there's something paradoxical about algorithmic serendipity. True serendipity isn't engineered or predicted; it's the absence of prediction. When an algorithm determines that you would enjoy something unexpected and then serves you that unexpected thing, it's no longer unexpected. It's been calculated, predicted, and delivered. The serendipity has been optimised out in the very act of trying to optimise it in.

Companies need to find a balance between targeted optimisation and explorative openness to the unexpected. Algorithms that only deliver personalised content can prevent new ideas from emerging, and companies must ensure that AI also offers alternative perspectives.

The filter bubble effect has broader cultural implications. If millions of travellers are all being guided by algorithms trained on similar data sets, we may see a homogenisation of travel experiences. The same “hidden gems” recommended to everyone. The same Instagram-worthy locations appearing in everyone's feeds. The same optimised itineraries walking the same optimised routes.

Consider what happens when an algorithm identifies an underappreciated restaurant or viewpoint and begins recommending it widely. Within months, it's overwhelmed with visitors, loses the character that made it special, and ultimately becomes exactly the sort of tourist trap the algorithm was meant to help users avoid. Algorithmic discovery at scale creates its own destruction.

This represents not just an individual loss but a collective one: the gradual narrowing of what's experienced, what's valued, and ultimately what's preserved and maintained in tourist destinations. If certain sites and experiences are never surfaced by algorithms, they may cease to be economically viable, leading to a feedback loop where algorithmic recommendation shapes not just what we see but what survives to be seen.

Local businesses that don't optimise for algorithmic visibility, that don't accumulate reviews on the platforms that feed AI recommendations, simply vanish from the digital map. They may continue to serve local communities, but to the algorithmically-guided traveller, they effectively don't exist. This creates evolutionary pressure for businesses to optimise for algorithm-friendliness rather than quality, authenticity, or innovation.

Towards a More Balanced Future

The trajectory of AI personalisation in travel is not predetermined. Technical, regulatory, and cultural interventions could shape a future that preserves the benefits whilst mitigating the harms.

Privacy-enhancing technologies (PETs) offer one promising avenue. PETs include technologies like differential privacy, homomorphic encryption, federated learning, and zero-knowledge proofs, designed to protect personal data whilst enabling valuable data use. Federated learning, in particular, allows parties to share insights from analysis on individual data sets without sharing data itself. This decentralised approach to machine learning trains AI models with data accessed on the user's device, potentially offering personalisation without centralised surveillance.

Whilst adoption in the travel industry remains limited, PETs have been successfully implemented in healthcare, finance, insurance, telecommunications, and law enforcement. Technologies like encryption and federated learning ensure that sensitive information remains protected even during international exchanges.

The promise of federated learning for travel is significant. Your travel preferences, booking patterns, and behavioural data could remain on your device, encrypted and under your control. AI models could be trained on aggregate patterns without any individual's data ever being centralised or exposed. Personalisation would emerge from local processing rather than surveillance. The technology exists. What's lacking is commercial incentive to implement it and regulatory pressure to require it.

Data minimisation represents another practical approach: collecting only the minimum amount of data necessary from users. When tour operators limit the data collected from customers, they reduce risk and potential exposure points. Beyond securing data, businesses must be transparent with customers about its use.

Some companies are beginning to recognise the value proposition of privacy. According to the Apex report, whilst 66% of consumers worry about data misuse, 62% might share more personal data if there's a discernible advantage. This suggests an opportunity for travel companies to differentiate themselves through stronger privacy protections, offering travellers the choice between convenience with surveillance or slightly less personalisation with greater privacy.

Regulatory pressure is intensifying. The EU AI Act's risk-based framework requires companies to conduct risk assessments and conformity assessments before using high-risk systems and to ensure there is a “human in the loop”. This mandates that consequential decisions cannot be fully automated but must involve human oversight and the possibility of human intervention.

The European Data Protection Board has issued guidance on facial recognition at airports, finding that the only storage solutions compatible with privacy requirements are those where biometric data is stored in the hands of the individual or in a central database with the encryption key solely in their possession. This points towards user-controlled data architectures that return agency to travellers.

Some advocates argue for a right to “analogue alternatives”, ensuring that those who opt out of AI-driven systems aren't excluded from services or charged premium prices for privacy. Just as passengers can opt out of facial recognition at airport security and instead go through standard identity verification, travellers should be able to access non-personalised booking experiences without penalty.

Addressing the filter bubble requires both technical and interface design interventions. Recommendation systems could include “exploration modes” that deliberately surface options outside a user's typical preferences. They could make filter bubble effects visible, showing users how their browsing history influences recommendations and offering easy ways to reset or diversify their algorithmic profile.

More fundamentally, travel platforms could reconsider optimisation metrics. Rather than purely optimising for predicted satisfaction or booking conversion, systems could incorporate diversity, novelty, and serendipity as explicit goals. This requires accepting that the “best” recommendation isn't always the one most likely to match past preferences.

Platforms could implement “algorithmic sabbaticals”, periodically resetting recommendation profiles to inject fresh perspectives. They could create “surprise me” features that deliberately ignore your history and suggest something completely different. They could show users the roads not taken, making visible the destinations and experiences filtered out by personalisation algorithms.

Cultural shifts matter as well. Travellers can resist algorithmic curation by deliberately seeking out resources that don't rely on personalisation: physical guidebooks, local advice, random exploration. They can regularly audit and reset their digital profiles, use privacy-focused browsers and VPNs, and opt out of location tracking when it's not essential.

Travel industry professionals can advocate for ethical AI practices within their organisations, pushing back against dark patterns and manipulative design. They can educate travellers about data practices and offer genuine choices about privacy. They can prioritise long-term trust over short-term optimisation.

More than 50% of travel agencies used generative AI in 2024 to help customers with the booking process, yet less than 15% of travel agencies and tour operators currently use AI tools, indicating significant room for growth and evolution in how these technologies are deployed. This adoption phase represents an opportunity to shape norms and practices before they become entrenched.

The Choice Before Us

We stand at an inflection point in travel technology. The AI personalisation systems being built today will shape travel experiences for decades to come. The data architecture, privacy practices, and algorithmic approaches being implemented now will be difficult to undo once they become infrastructure.

The fundamental tension is between optimisation and openness, between the algorithm that knows exactly what you want and the possibility that you don't yet know what you want yourself. Between the curated experience that maximises predicted satisfaction and the unstructured exploration that creates space for transformation.

This isn't a Luddite rejection of technology. AI personalisation offers genuine benefits: reduced decision fatigue, discovery of options matching niche preferences, accessibility improvements for travellers with disabilities or language barriers, and efficiency gains that make travel more affordable and accessible.

For travellers with mobility limitations, AI systems that automatically filter for wheelchair-accessible hotels and attractions provide genuine liberation. For those with dietary restrictions or allergies, personalisation that surfaces safe dining options offers peace of mind. For language learners, systems that match proficiency levels to destination difficulty facilitate growth. These are not trivial conveniences but meaningful enhancements to the travel experience.

But these benefits need not come at the cost of privacy, autonomy, and serendipity. Technical alternatives exist. Regulatory frameworks are emerging. Consumer awareness is growing.

What's required is intentionality: a collective decision about what kind of travel future we want to build. Do we want a world where every journey is optimised, predicted, and curated, where the algorithm decides what experiences are worth having? Or do we want to preserve space for privacy, for genuine choice, for unexpected discovery?

The sixty-six percent of travellers who reported being more interested in travel now than before the pandemic, according to McKinsey's 2024 survey, represent an enormous economic force. If these travellers demand better privacy protections, genuine transparency, and algorithmic systems designed for exploration rather than exploitation, the industry will respond.

Consumer power remains underutilised in this equation. Individual travellers often feel powerless against platform policies and opaque algorithms, but collectively they represent the revenue stream that sustains the entire industry. Coordinated demand for privacy-protective alternatives, willingness to pay premium prices for surveillance-free services, and vocal resistance to manipulative practices could shift commercial incentives.

Travel has always occupied a unique place in human culture. It's been seen as transformative, educational, consciousness-expanding. The grand tour, the gap year, the pilgrimage, the journey of self-discovery: these archetypes emphasise travel's potential to change us, to expose us to difference, to challenge our assumptions.

Algorithmic personalisation, taken to its logical extreme, threatens this transformative potential. If we only see what algorithms predict we'll like based on what we've liked before, we remain imprisoned in our past preferences. We encounter not difference but refinement of sameness. The algorithm becomes not a window to new experiences but a mirror reflecting our existing biases back to us with increasing precision.

The algorithm may know where you'll go next. But perhaps the more important question is: do you want it to? And if not, what are you willing to do about it?

The answer lies not in rejection but in intentional adoption. Use AI tools, but understand their limitations. Accept personalisation, but demand transparency about its mechanisms. Enjoy curated recommendations, but deliberately seek out the uncurated. Let algorithms reduce friction and surface options, but make the consequential choices yourself.

Travel technology should serve human flourishing, not corporate surveillance. It should expand possibility rather than narrow it. It should enable discovery rather than dictate it. Achieving this requires vigilance from travellers, responsibility from companies, and effective regulation from governments. The age of AI travel personalisation has arrived. The question is whether we'll shape it to human values or allow it to shape us.


Sources and References

European Data Protection Board. (2024). “Facial recognition at airports: individuals should have maximum control over biometric data.” https://www.edpb.europa.eu/

Fortune. (2024, January 25). “Travel companies are using AI to better customize trip itineraries.” Fortune Magazine.

McKinsey & Company. (2024). “The promise of travel in the age of AI.” McKinsey & Company.

McKinsey & Company. (2024). “Remapping travel with agentic AI.” McKinsey & Company.

McKinsey & Company. (2024). “The State of Travel and Hospitality 2024.” Survey of more than 5,000 travellers across China, Germany, UAE, UK, and United States.

Nature. (2024). “Inevitable challenges of autonomy: ethical concerns in personalized algorithmic decision-making.” Humanities and Social Sciences Communications.

Oliver Wyman. (2024, May). “This Is How Generative AI Is Making Travel Planning Easier.” Oliver Wyman.

Transportation Security Administration. (2024). “TSA PreCheck® Touchless ID: Evaluating Facial Identification Technology.” U.S. Department of Homeland Security.

Travel And Tour World. (2024). “Europe's AI act sets global benchmark for travel and tourism.” Travel And Tour World.

Travel And Tour World. (2024). “How Data Breaches Are Shaping the Future of Travel Security.” Travel And Tour World.

U.S. Government Accountability Office. (2022). “Facial Recognition Technology: CBP Traveler Identity Verification and Efforts to Address Privacy Issues.” Report GAO-22-106154.

Verizon. (2025). “2025 Data Breach Investigations Report.” Verizon Business.


Tim Green

Tim Green UK-based Systems Theorist & Independent Technology Writer

Tim explores the intersections of artificial intelligence, decentralised cognition, and posthuman ethics. His work, published at smarterarticles.co.uk, challenges dominant narratives of technological progress while proposing interdisciplinary frameworks for collective intelligence and digital stewardship.

His writing has been featured on Ground News and shared by independent researchers across both academic and technological communities.

ORCID: 0009-0002-0156-9795 Email: tim@smarterarticles.co.uk

Discuss...

#HumanInTheLoop #AITravelPersonalisation #DataPrivacy #AlgorithmicManipulation

When you delete a conversation with ChatGPT, you might reasonably assume that it disappears. Click the rubbish bin icon, confirm your choice, and within 30 days, according to OpenAI's policy, those messages vanish from the company's servers. Except that in 2024, a court order threw this assumption into chaos. OpenAI was forced to retain all ChatGPT logs, including those users believed were permanently deleted. The revelation highlighted an uncomfortable truth: even when we think our data is gone, it might persist in ways we barely understand.

This isn't merely about corporate data retention policies or legal manoeuvres. It's about something more fundamental to how large language models work. These systems don't just process information; they absorb it, encoding fragments of training data into billions of neural network parameters. And once absorbed, that information becomes extraordinarily difficult to extract, even when regulations like the General Data Protection Regulation (GDPR) demand it.

The European Data Protection Board wrestled with this problem throughout 2024, culminating in Opinion 28/2024, a comprehensive attempt to reconcile AI development with data protection law. The board acknowledged what technologists already knew: LLMs present a privacy paradox. They promise personalised, intelligent assistance whilst simultaneously undermining two foundational privacy principles: informed consent and data minimisation.

The Architecture of Remembering

To understand why LLMs create such thorny ethical problems, you need to grasp how they retain information. Unlike traditional databases that store discrete records in retrievable formats, language models encode knowledge as numerical weights distributed across their neural architecture. During training, these models ingest vast datasets scraped from the internet, books, academic papers, and increasingly, user interactions. The learning process adjusts billions of parameters to predict the next word in a sequence, and in doing so, the model inevitably memorises portions of its training data.

In 2021, a team of researchers led by Nicholas Carlini at Google demonstrated just how significant this memorisation could be. Their paper “Extracting Training Data from Large Language Models,” presented at the USENIX Security Symposium, showed that adversaries could recover individual training examples from GPT-2 by carefully querying the model. The researchers extracted hundreds of verbatim text sequences, including personally identifiable information: names, phone numbers, email addresses, IRC conversations, code snippets, and even 128-bit UUIDs. Critically, they found that larger models were more vulnerable than smaller ones, suggesting that as LLMs scale, so does their capacity to remember.

This isn't a bug; it's an intrinsic feature of how neural networks learn. The European Data Protection Board's April 2025 report on AI Privacy Risks and Mitigations for Large Language Models explained that during training, LLMs analyse vast datasets, and if fine-tuned with company-specific or user-generated data, there's a risk of that information being memorised and resurfacing unpredictably. The process creates what researchers call “eidetic memorisation,” where models reproduce training examples with near-perfect fidelity.

But memorisation represents only one dimension of the privacy risk. Recent research has demonstrated that LLMs can also infer sensitive attributes from text without explicitly memorising anything. A May 2024 study published as arXiv preprint 2310.07298, “Beyond Memorization: Violating Privacy Via Inference with Large Language Models,” presented the first comprehensive analysis of pretrained LLMs' capabilities to infer personal attributes from text. The researchers discovered that these models could deduce location, income, and sex with up to 85% top-one accuracy and 95% top-three accuracy. The model doesn't need to have seen your specific data; it leverages statistical patterns learned from millions of training examples to make educated guesses about individuals.

This inferential capability creates a paradox. Even if we could perfectly prevent memorisation, LLMs would still pose privacy risks through their ability to reconstruct probable personal information from contextual clues. It's akin to the difference between remembering your exact address versus deducing your neighbourhood from your accent, the shops you mention, and the weather you describe.

Informed consent rests on a simple premise: individuals should understand what data is being collected, how it will be used, and what risks it entails before agreeing to participate. In data protection law, GDPR Article 6 specifies that in most cases, the only justification for processing personal data is the active and informed consent (opt-in consent) of the data subject.

But how do you obtain informed consent for a system whose data practices are fundamentally opaque? When you interact with ChatGPT, Claude, or any other conversational AI, can you genuinely understand where your words might end up? The answer, according to legal scholars and technologists alike, is: probably not.

The Italian Data Protection Authority became one of the first regulators to scrutinise this issue seriously. Throughout 2024, Italian authorities increasingly examined the extent of user consent when publicly available data is re-purposed for commercial LLMs. The challenge stems from a disconnect between traditional consent frameworks and the reality of modern AI development. When a company scrapes the internet to build a training dataset, it typically doesn't secure individual consent from every person whose words appear in forum posts, blog comments, or social media updates. Instead, developers often invoke “legitimate interest” as a legal basis under GDPR Article 6(1)(f).

The European Data Protection Board's Opinion 28/2024 highlighted divergent national stances on whether broad web scraping for AI training constitutes a legitimate interest. The board urged a case-by-case assessment, but the guidance offered little comfort to individuals concerned about their data. The fundamental problem is that once information enters an LLM's training pipeline, the individual loses meaningful control over it.

Consider the practical mechanics. Even if a company maintains records of its training data sources, which many proprietary systems don't disclose, tracing specific information back to identifiable individuals proves nearly impossible. As a 2024 paper published in the Tsinghua China Law Review noted, in LLMs it is hard to know what personal data is used in training and how to attribute these data to particular individuals. Data subjects can only learn about their personal data by either inspecting the original training datasets, which companies rarely make available, or by prompting the models. But prompting cannot guarantee the outputs contain the full list of information stored in the model weights.

This opacity undermines the core principle of informed consent. How can you consent to something you cannot inspect or verify? The European Data Protection Board acknowledged this problem in Opinion 28/2024, noting that processing personal data to avoid risks of potential biases and errors can be included when this is clearly and specifically identified within the purpose, and the personal data is necessary for that purpose. But the board also emphasised that this necessity must be demonstrable: the processing must genuinely serve the stated purpose and no less intrusive alternative should exist.

Anthropic's approach to consent illustrates the industry's evolving strategy. In 2024, the company announced it would extend data retention to five years for users who allow their data to be used for model training. Users who opt out maintain the standard 30-day retention period. This creates a two-tier system: those who contribute to AI improvement in exchange for extended data storage, and those who prioritise privacy at the cost of potentially less personalised experiences.

OpenAI took a different approach with its Memory feature, rolled out broadly in 2024. The system allows ChatGPT to remember details across conversations, creating a persistent context that improves over time. OpenAI acknowledged that memory brings additional privacy and safety considerations, implementing steering mechanisms to prevent ChatGPT from proactively remembering sensitive information like health details unless explicitly requested. Users can view, delete, or entirely disable the Memory feature, but research conducted in 2024 found that a European audit discovered 63% of ChatGPT user data contained personally identifiable information, with only 22% of users aware of the settings to disable data retention features.

The consent problem deepens when you consider the temporal dimension. LLMs are trained on datasets compiled at specific points in time, often years before the model's public release. Information you posted online in 2018 might appear in a model trained in 2022 and deployed in 2024. Did you consent to that use when you clicked “publish” on your blog six years ago? Legal frameworks struggle to address this temporal gap.

Data Minimisation in an Age of Maximalism

If informed consent presents challenges for LLMs, data minimisation appears nearly incompatible with their fundamental architecture. GDPR Article 5(1)© requires that personal data be “adequate, relevant and limited to what is necessary in relation to the purposes for which they are processed.” Recital 39 clarifies that “personal data should be processed only if the purpose of the processing could not reasonably be fulfilled by other means.”

The UK Information Commissioner's Office guidance on AI and data protection emphasises that organisations must identify the minimum amount of personal data needed to fulfil a purpose and process only that information, no more. Yet the very nature of machine learning relies on ingesting massive amounts of data to train and test algorithms. The European Data Protection Board noted in Opinion 28/2024 that the assessment of necessity entails two elements: whether the processing activity will allow the pursuit of the purpose, and whether there is no less intrusive way of pursuing this purpose.

This creates a fundamental tension. LLM developers argue, with some justification, that model quality correlates strongly with training data volume and diversity. Google's research on differential privacy for language models noted that when you increase the number of training tokens, the LLM's memorisation capacity increases, but so does its general capability. The largest, most capable models like GPT-4, Claude, and Gemini owe their impressive performance partly to training on datasets comprising hundreds of billions or even trillions of tokens.

From a data minimisation perspective, this approach appears maximalist. Do you really need every Reddit comment from the past decade to build an effective language model? Could synthetic data, carefully curated datasets, or anonymised information serve the same purpose? The answer depends heavily on your definition of “necessary” and your tolerance for reduced performance.

Research presented at the 2025 ACM Conference on Fairness, Accountability, and Transparency tackled this question directly. The paper “The Data Minimization Principle in Machine Learning” (arXiv:2405.19471) introduced an optimisation framework for data minimisation based on legal definitions. The researchers demonstrated that techniques such as pseudonymisation and feature selection by importance could help limit the type and volume of processed personal data. The key insight was to document which data points actually contribute to model performance and discard the rest.

But this assumes you can identify relevant versus irrelevant data before training, which LLMs' unsupervised learning approach makes nearly impossible. You don't know which fragments of text will prove crucial until after the model has learned from them. It's like asking an architect to design a building using the minimum necessary materials before understanding the structure's requirements.

Cross-session data retention exacerbates the minimisation challenge. Modern conversational AI systems increasingly maintain context across interactions. If previous conversation states, memory buffers, or hidden user context aren't carefully managed or sanitised, sensitive information can reappear in later responses, bypassing initial privacy safeguards. This architectural choice, whilst improving user experience, directly contradicts data minimisation's core principle: collect and retain only what's immediately necessary.

Furthermore, recent research on privacy attacks against LLMs suggests that even anonymised training data might be vulnerable. A 2024 paper on membership inference attacks against fine-tuned large language models demonstrated that the SPV-MIA method raises the AUC of membership inference attacks from 0.7 to 0.9. These attacks determine whether a specific data point was part of the training dataset by querying the model and analysing confidence scores. If an attacker can infer dataset membership, they can potentially reverse-engineer personal information even from supposedly anonymised training data.

The Right to Erasure Meets Immutable Models

Perhaps no single GDPR provision highlights the LLM consent and minimisation challenge more starkly than Article 17, the “right to erasure” or “right to be forgotten.” The regulation grants individuals the right to obtain erasure of personal data concerning them without undue delay, which legal commentators generally interpret as approximately one month.

For traditional databases, compliance is straightforward: locate the relevant records and delete them. Search engines developed mature technical solutions for removing links to content. But LLMs present an entirely different challenge. A comprehensive survey published in 2024 as arXiv preprint 2307.03941, “Right to be Forgotten in the Era of Large Language Models: Implications, Challenges, and Solutions,” catalogued the obstacles.

The core technical problem stems from model architecture. Once trained, model parameters encapsulate information learned during training, making it difficult to remove specific data points without retraining the entire model. Engineers acknowledge that the only way to completely remove an individual's data is to retrain the model from scratch, an impractical solution. Training a large language model may take months and consume millions of pounds worth of computational resources, far exceeding the “undue delay” permitted by GDPR.

Alternative approaches exist but carry significant limitations. Machine unlearning techniques attempt to make models “forget” specific data points without full retraining. The most prominent framework, SISA (Sharded, Isolated, Sliced, and Aggregated) training, was introduced by Bourtoule and colleagues in 2019. SISA partitions training data into isolated shards and trains an ensemble of constituent models, saving intermediate checkpoints after processing each data slice. When unlearning a data point, only the affected constituent model needs reverting to a prior state and partial retraining on a small fraction of data.

This mechanism provides exact unlearning guarantees whilst offering significant efficiency gains over full retraining. Research showed that sharding alone speeds up the retraining process by 3.13 times on the Purchase dataset and 1.66 times on the Street View House Numbers dataset, with additional acceleration through slicing.

But SISA and similar approaches require forethought. You must design your training pipeline with unlearning in mind from the beginning, which most existing LLMs did not do. Retrofitting SISA to already-trained models proves impossible. Alternative techniques like model editing, guardrails, and unlearning layers show promise in research settings but remain largely unproven at the scale of commercial LLMs.

The challenge extends beyond technical feasibility. Even if efficient unlearning were possible, identifying what to unlearn poses its own problem. Training datasets are sometimes not disclosed, especially proprietary ones, and prompting trained models cannot guarantee the outputs contain the full list of information stored in the model weights.

Then there's the hallucination problem. LLMs frequently generate plausible-sounding information that doesn't exist in their training data, synthesised from statistical patterns. Removing hallucinated information becomes paradoxically challenging since it was never in the training dataset to begin with. How do you forget something the model invented?

The legal-technical gap continues to widen. Although the European Data Protection Board ruled that AI developers can be considered data controllers under GDPR, the regulation lacks clear guidelines for enforcing erasure within AI systems. Companies can argue, with some technical justification, that constraints make compliance impossible. This creates a regulatory stalemate: the law demands erasure, but the technology cannot deliver it without fundamental architectural changes.

Differential Privacy

Faced with these challenges, researchers and companies have increasingly turned to differential privacy as a potential remedy. The technique, formalised in 2006, allows systems to train machine learning models whilst rigorously guaranteeing that the learned model respects the privacy of its training data by injecting carefully calibrated noise into the training process.

The core insight of differential privacy is that by adding controlled randomness, you can ensure that an observer cannot determine whether any specific individual's data was included in the training set. The privacy guarantee is mathematical and formal: the probability of any particular output changes only minimally whether or not a given person's data is present.

For language models, the standard approach employs DP-SGD (Differentially Private Stochastic Gradient Descent). During training, the algorithm clips gradients to bound each example's influence and adds Gaussian noise to the aggregated gradients before updating model parameters. Google Research demonstrated this approach with VaultGemma, which the company described as the world's most capable differentially private LLM. VaultGemma 1B shows no detectable memorisation of its training data, successfully demonstrating DP training's efficacy.

But differential privacy introduces a fundamental trade-off between privacy and utility. The noise required to guarantee privacy degrades model performance. Google researchers found that when you apply standard differential privacy optimisation techniques like DP-SGD to train large language models, the performance ends up much worse than non-private models because the noise added for privacy protection tends to scale with the model size.

Recent advances have mitigated this trade-off somewhat. Research published in 2024 (arXiv:2407.07737) on “Fine-Tuning Large Language Models with User-Level Differential Privacy” introduced improved techniques. User-level DP, a stronger form of privacy, guarantees that an attacker using a model cannot learn whether the user's data is included in the training dataset. The researchers found that their ULS approach performs significantly better in settings where either strong privacy guarantees are required or the compute budget is large.

Google also developed methods for generating differentially private synthetic data, creating entirely artificial data that has the key characteristics of the original whilst offering strong privacy protection. This approach shows promise for scenarios where organisations need to share datasets for research or development without exposing individual privacy.

Yet differential privacy, despite its mathematical elegance, doesn't solve the consent and minimisation problems. It addresses the symptom (privacy leakage) rather than the cause (excessive data collection and retention). A differentially private LLM still trains on massive datasets, still potentially incorporates data without explicit consent, and still resists targeted erasure. The privacy guarantee applies to the aggregate statistical properties, not to individual autonomy and control.

Moreover, differential privacy makes implicit assumptions about data structure that do not hold for the majority of natural language data. A 2022 ACM paper, “What Does it Mean for a Language Model to Preserve Privacy?” highlighted this limitation. Text contains rich, interconnected personal information that doesn't fit neatly into the independent data points that differential privacy theory assumes.

Regulatory Responses and Industry Adaptation

Regulators worldwide have begun grappling with these challenges, though approaches vary significantly. The European Union's AI Act, which entered into force in August 2024 with phased implementation, represents the most comprehensive legislative attempt to govern AI systems, including language models.

Under the AI Act, transparency is defined as AI systems being developed and used in a way that allows appropriate traceability and explainability, whilst making humans aware that they communicate or interact with an AI system. For general-purpose AI models, which include large language models, specific transparency and copyright-related rules became effective in August 2025.

Providers of general-purpose AI models must draw up and keep up-to-date technical documentation containing a description of the model development process, including details around training and testing. The European Commission published a template to help providers summarise the data used to train their models. Additionally, companies must inform users when they are interacting with an AI system, unless it's obvious, and AI systems that create synthetic content must mark their outputs as artificially generated.

But transparency, whilst valuable, doesn't directly address consent and minimisation. Knowing that an AI system trained on your data doesn't give you the power to prevent that training or demand erasure after the fact. A 2024 paper presented at the Pan-Hellenic Conference on Computing and Informatics acknowledged that transparency raises immense challenges for LLM developers due to the intrinsic black-box nature of these models.

The GDPR and AI Act create overlapping but not identical regulatory frameworks. Organisations developing LLMs in the EU must comply with both, navigating data protection principles alongside AI-specific transparency and risk management requirements. The European Data Protection Board's Opinion 28/2024 attempted to clarify how these requirements apply to AI models, but many questions remain unresolved.

Industry responses have varied. OpenAI's enterprise privacy programme offers Zero Data Retention (ZDR) options for API users with qualifying use cases. Under ZDR, inputs and outputs are removed from OpenAI's systems immediately after processing, providing a clearer minimisation pathway for business customers. However, the court-ordered data retention affecting consumer ChatGPT users demonstrates the fragility of these commitments when legal obligations conflict.

Anthropic's tiered retention model, offering 30-day retention for users who opt out of data sharing versus five-year retention for those who opt in, represents an attempt to align business needs with user preferences. But this creates its own ethical tension: users who most value privacy receive less personalised service, whilst those willing to sacrifice privacy for functionality subsidise model improvement for everyone.

The challenge extends to enforcement. Data protection authorities lack the technical tools and expertise to verify compliance claims. How can a regulator confirm that an LLM has truly forgotten specific training examples? Auditing these systems requires capabilities that few governmental bodies possess. This enforcement gap allows a degree of regulatory theatre, where companies make compliance claims that are difficult to substantively verify.

The Broader Implications

The technical and regulatory challenges surrounding LLM consent and data minimisation reflect deeper questions about the trajectory of AI development. We're building increasingly powerful systems whose capabilities emerge from the ingestion and processing of vast information stores. This architectural approach creates fundamental tensions with privacy frameworks designed for an era of discrete, identifiable data records.

Research into privacy attacks continues to reveal new vulnerabilities. Work on model inversion attacks demonstrates that adversaries could reverse-engineer private images used during training by updating input images and observing changes in output probabilities. A comprehensive survey from November 2024 (arXiv:2411.10023), “Model Inversion Attacks: A Survey of Approaches and Countermeasures,” catalogued the evolving landscape of these threats.

Studies also show that privacy risks are not evenly distributed. Research has found that minority groups often experience higher privacy leakage, attributed to models tending to memorise more about smaller subgroups. This raises equity concerns: the populations already most vulnerable to surveillance and data exploitation face amplified risks from AI systems.

The consent and minimisation problems also intersect with broader questions of AI alignment and control. If we cannot effectively specify what data an LLM should and should not retain, how can we ensure these systems behave in accordance with human values more generally? The inability to implement precise data governance suggests deeper challenges in achieving fine-grained control over AI behaviour.

Some researchers argue that we need fundamentally different approaches to AI development. Rather than training ever-larger models on ever-more-expansive datasets, perhaps we should prioritise architectures that support granular data management, selective forgetting, and robust attribution. This might mean accepting performance trade-offs in exchange for better privacy properties, a proposition that faces resistance in a competitive landscape where capability often trumps caution.

The economic incentives cut against privacy-preserving approaches. Companies that accumulate the largest datasets and build the most capable models gain competitive advantages, creating pressure to maximise data collection rather than minimise it. User consent becomes a friction point to be streamlined rather than a meaningful check on corporate power.

Yet the costs of this maximalist approach are becoming apparent. Privacy harms from data breaches, unauthorised inference, and loss of individual autonomy accumulate. Trust in AI systems erodes as users realise the extent to which their information persists beyond their control. Regulatory backlash intensifies, threatening innovation with blunt instruments when nuanced governance mechanisms remain underdeveloped.

If the current trajectory proves unsustainable, what alternatives exist? Several technical and governance approaches show promise, though none offers a complete solution.

Enhanced transparency represents a minimal baseline. Organisations should provide clear, accessible documentation of what data they collect, how long they retain it, what models they train, and what risks users face. The European Commission's documentation templates for AI Act compliance move in this direction, but truly informed consent requires going further. Users need practical tools to inspect what information about them might be embedded in models, even if perfect visibility remains impossible.

Consent mechanisms need fundamental rethinking. The binary choice between “agree to everything” and “don't use the service” fails to respect autonomy. Granular consent frameworks, allowing users to specify which types of data processing they accept and which they reject, could provide more meaningful control. Some researchers propose “consent as a service” platforms that help individuals manage their data permissions across multiple AI systems.

On the minimisation front, organisations could adopt privacy-by-design principles more rigorously. This means architecting systems from the ground up to collect only necessary data, implementing retention limits, and ensuring genuine deletability. SISA-style approaches to training, whilst requiring upfront investment, enable more credible compliance with erasure requests. Synthetic data generation, differential privacy, and federated learning all merit broader deployment despite their current limitations.

Regulatory frameworks require refinement. The GDPR's principles remain sound, but their application to AI systems needs clearer guidance. The European Data Protection Board's ongoing work to clarify AI-specific requirements helps, but questions around legitimate interest, necessity assessments, and technical feasibility standards need more definitive answers. International coordination could prevent a race to the bottom where companies jurisdiction-shop for the most permissive regulations.

Enforcement mechanisms must evolve. Data protection authorities need enhanced technical capacity to audit AI systems, verify compliance claims, and detect violations. This might require specialised AI audit teams, standardised testing protocols, and stronger whistleblower protections. Meaningful penalties for non-compliance, consistently applied, would shift incentive structures.

Fundamentally, though, addressing the LLM consent and minimisation challenge requires confronting uncomfortable questions about AI development priorities. Do we truly need models trained on the entirety of human written expression? Can we achieve valuable AI capabilities through more targeted, consensual data practices? What performance trade-offs should we accept in exchange for stronger privacy protections?

These questions have no purely technical answers. They involve value judgements about individual rights, collective benefits, commercial interests, and the kind of society we want to build. The fact that large language models retain inaccessible traces of prior user interactions does undermine informed consent and the ethical principle of data minimisation as currently understood. But whether this represents an acceptable cost, a surmountable challenge, or a fundamental flaw depends on what we prioritise.

The Path Forward

Standing at this crossroads, the AI community faces a choice. One path continues the current trajectory: building ever-larger models on ever-more-comprehensive datasets, managing privacy through patchwork technical measures and reactive compliance, accepting that consent and minimisation are aspirational rather than achievable. This path delivers capability but erodes trust.

The alternative path requires fundamental rethinking. It means prioritising privacy-preserving architectures even when they limit performance. It means developing AI systems that genuinely forget when asked. It means treating consent as a meaningful constraint rather than a legal formality. It means accepting that some data, even if technically accessible, should remain off-limits.

The choice isn't between privacy and progress. It's between different visions of progress: one that measures success purely in model capability and commercial value, versus one that balances capability with accountability, control, and respect for individual autonomy.

Large language models have demonstrated remarkable potential to augment human intelligence, creativity, and productivity. But their current architecture fundamentally conflicts with privacy principles that society has deemed important enough to enshrine in law. Resolving this conflict will require technical innovation, regulatory clarity, and above all, honest acknowledgement of the trade-offs we face.

The inaccessible traces that LLMs retain aren't merely a technical quirk to be optimised away. They're a consequence of foundational design decisions that prioritise certain values over others. Informed consent and data minimisation might seem antiquated in an age of billion-parameter models, but they encode important insights about power, autonomy, and the conditions necessary for trust.

Whether we can build genuinely consent-respecting, privacy-minimising AI systems that still deliver transformative capabilities remains an open question. But the answer will determine not just the future of language models, but the future of our relationship with artificial intelligence more broadly. The machines remember everything. The question is whether we'll remember why that matters.


Sources and References

Academic Papers and Research

  1. Carlini, N., Tramèr, F., Wallace, E., Jagielski, M., Herbert-Voss, A., Lee, K., Roberts, A., Brown, T., Song, D., Erlingsson, Ú., Oprea, A., and Raffel, C. (2021). “Extracting Training Data from Large Language Models.” 30th USENIX Security Symposium. https://www.usenix.org/conference/usenixsecurity21/presentation/carlini-extracting

  2. Bourtoule, L., et al. (2019). “Machine Unlearning.” Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security. (Referenced for SISA framework)

  3. “Beyond Memorization: Violating Privacy Via Inference with Large Language Models” (2024). arXiv:2310.07298.

  4. “The Data Minimization Principle in Machine Learning” (2025). arXiv:2405.19471. Proceedings of the 2025 ACM Conference on Fairness, Accountability, and Transparency.

  5. “Right to be Forgotten in the Era of Large Language Models: Implications, Challenges, and Solutions” (2024). arXiv:2307.03941.

  6. “Fine-Tuning Large Language Models with User-Level Differential Privacy” (2024). arXiv:2407.07737.

  7. “Practical Membership Inference Attacks against Fine-tuned Large Language Models via Self-prompt Calibration” (2024). arXiv:2311.06062.

  8. “Model Inversion Attacks: A Survey of Approaches and Countermeasures” (2024). arXiv:2411.10023.

  9. “On protecting the data privacy of Large Language Models (LLMs) and LLM agents: A literature review” (2025). ScienceDirect.

  10. “What Does it Mean for a Language Model to Preserve Privacy?” (2022). ACM FAccT Conference.

  11. “Enhancing Transparency in Large Language Models to Meet EU AI Act Requirements” (2024). Proceedings of the 28th Pan-Hellenic Conference on Progress in Computing and Informatics.

Regulatory Documents and Official Guidance

  1. European Data Protection Board. “Opinion 28/2024 on certain data protection aspects related to the processing of personal data in the context of AI models.” December 2024. https://www.edpb.europa.eu/system/files/2024-12/edpb_opinion_202428_ai-models_en.pdf

  2. European Data Protection Board. “AI Privacy Risks & Mitigations – Large Language Models (LLMs).” April 2025. https://www.edpb.europa.eu/system/files/2025-04/ai-privacy-risks-and-mitigations-in-llms.pdf

  3. Regulation (EU) 2016/679 (General Data Protection Regulation).

  4. Regulation (EU) 2024/1689 (EU AI Act).

  5. UK Information Commissioner's Office. “How should we assess security and data minimisation in AI?” https://ico.org.uk/for-organisations/uk-gdpr-guidance-and-resources/artificial-intelligence/guidance-on-ai-and-data-protection/how-should-we-assess-security-and-data-minimisation-in-ai/

  6. Irish Data Protection Commission. “AI, Large Language Models and Data Protection.” 18 July 2024. https://www.dataprotection.ie/en/dpc-guidance/blogs/AI-LLMs-and-Data-Protection

Corporate Documentation and Official Statements

  1. OpenAI. “Memory and new controls for ChatGPT.” https://openai.com/index/memory-and-new-controls-for-chatgpt/

  2. OpenAI. “How we're responding to The New York Times' data demands in order to protect user privacy.” https://openai.com/index/response-to-nyt-data-demands/

  3. OpenAI Help Center. “Chat and File Retention Policies in ChatGPT.” https://help.openai.com/en/articles/8983778-chat-and-file-retention-policies-in-chatgpt

  4. Anthropic Privacy Center. “How long do you store my data?” https://privacy.claude.com/en/articles/10023548-how-long-do-you-store-my-data

  5. Anthropic. “Updates to Consumer Terms and Privacy Policy.” https://www.anthropic.com/news/updates-to-our-consumer-terms

  6. Google Research Blog. “VaultGemma: The world's most capable differentially private LLM.” https://research.google/blog/vaultgemma-the-worlds-most-capable-differentially-private-llm/

  7. Google Research Blog. “Fine-tuning LLMs with user-level differential privacy.” https://research.google/blog/fine-tuning-llms-with-user-level-differential-privacy/


Tim Green

Tim Green UK-based Systems Theorist & Independent Technology Writer

Tim explores the intersections of artificial intelligence, decentralised cognition, and posthuman ethics. His work, published at smarterarticles.co.uk, challenges dominant narratives of technological progress while proposing interdisciplinary frameworks for collective intelligence and digital stewardship.

His writing has been featured on Ground News and shared by independent researchers across both academic and technological communities.

ORCID: 0009-0002-0156-9795 Email: tim@smarterarticles.co.uk

Discuss...

#HumanInTheLoop #MemoryManagement #DataPrivacy #AICompliance

In December 2024, the European Data Protection Board gathered in Brussels to wrestle with a question that sounds deceptively simple: Can artificial intelligence forget? The board's Opinion 28/2024, released on 18 December, attempted to provide guidance on when AI models could be considered “anonymous” and how personal data rights apply to these systems. Yet beneath the bureaucratic language lay an uncomfortable truth—the very architecture of modern AI makes the promise of data deletion fundamentally incompatible with how these systems actually work.

The stakes couldn't be higher. Large language models like ChatGPT, Claude, and Gemini have been trained on petabytes of human expression scraped from the internet, often without consent. Every tweet, blog post, forum comment, and academic paper became training data for systems that now shape everything from medical diagnoses to hiring decisions. As Seth Neel, Assistant Professor at Harvard Business School and head of the Trustworthy AI Lab, explains, “Machine unlearning is really about computation more than anything else. It's about efficiently removing the influence of that data from the model without having to retrain it from scratch.”

But here's the catch: unlike a traditional database where you can simply delete a row, AI models don't store information in discrete, removable chunks. They encode patterns across billions of parameters, each one influenced by millions of data points. Asking an AI to forget specific information is like asking a chef to remove the salt from a baked cake—theoretically possible if you start over, practically impossible once it's done.

The California Experiment

In September 2024, California became the first state to confront this paradox head-on. Assembly Bill 1008, signed into law by Governor Gavin Newsom on 28 September, expanded the definition of “personal information” under the California Privacy Rights Act to include what lawmakers called “abstract digital formats”—model weights, tokens, and other outputs derived from personal data. The law, which took effect on 1 January 2025, grants Californians the right to request deletion of their data even after it's been absorbed into an AI model's neural pathways.

The legislation sounds revolutionary on paper. For the first time, a major jurisdiction legally recognised that AI models contain personal information in their very structure, not just in their training datasets. But the technical reality remains stubbornly uncooperative. As Ken Ziyu Liu, a PhD student at Stanford who authored “Machine Unlearning in 2024,” notes in his influential blog post from May 2024, “Evaluating unlearning on LLMs had been more of an art than science. The key issue has been the desperate lack of datasets and benchmarks for unlearning evaluation.”

The California Privacy Protection Agency, which voted to support the bill, acknowledged these challenges but argued that technical difficulty shouldn't exempt companies from privacy obligations. Yet critics point out that requiring companies to retrain massive models after each deletion request could cost millions of pounds and consume enormous computational resources—effectively making compliance economically unfeasible for all but the largest tech giants.

The European Paradox

Across the Atlantic, European regulators have been grappling with similar contradictions. The General Data Protection Regulation's Article 17, the famous “right to be forgotten,” predates the current AI boom by several years. When it was written, erasure meant something straightforward: find the data, delete it, confirm it's gone. But AI has scrambled these assumptions entirely.

The EDPB's December 2024 opinion attempted to thread this needle by suggesting that AI models should be assessed for anonymity on a “case by case basis.” If a model makes it “very unlikely” to identify individuals or extract their personal data through queries, it might be considered anonymous and thus exempt from deletion requirements. But this raises more questions than it answers. How unlikely is “very unlikely”? Who makes that determination? And what happens when adversarial attacks can coax models into revealing training data they supposedly don't “remember”?

Reuben Binns, Associate Professor at Oxford University's Department of Computer Science and former Postdoctoral Research Fellow in AI at the UK's Information Commissioner's Office, has spent years studying these tensions between privacy law and technical reality. His research on contextual integrity and data protection reveals a fundamental mismatch between how regulations conceptualise data and how AI systems actually process information.

Meanwhile, the Hamburg Data Protection Authority has taken a controversial stance, maintaining that large language models don't contain personal data at all and therefore aren't subject to deletion rights. This position directly contradicts California's approach and highlights the growing international fragmentation in AI governance.

The Unlearning Illusion

The scientific community has been working overtime to solve what they call the “machine unlearning” problem. In 2024 alone, researchers published dozens of papers proposing various techniques: gradient-based methods, data attribution algorithms, selective retraining protocols. Google DeepMind's Eleni Triantafillou, a senior research scientist who co-organised the first NeurIPS Machine Unlearning Challenge in 2023, has been at the forefront of these efforts.

Yet even the most promising approaches come with significant caveats. Triantafillou's 2024 paper “Are we making progress in unlearning?” reveals a sobering reality: current unlearning methods often fail to completely remove information, can degrade model performance unpredictably, and may leave traces that sophisticated attacks can still exploit. The paper, co-authored with researchers including Peter Kairouz and Fabian Pedregosa from Google DeepMind, suggests that true unlearning might require fundamental architectural changes to how we build AI systems.

The challenge becomes even more complex when dealing with foundation models—the massive, general-purpose systems that underpin most modern AI applications. These models learn abstract representations that can encode information about individuals in ways that are nearly impossible to trace or remove. A model might not explicitly “remember” that John Smith lives in Manchester, but it might have learned patterns from thousands of social media posts that allow it to make accurate inferences about John Smith when prompted correctly.

The Privacy Theatre

OpenAI's approach to data deletion requests reveals the theatrical nature of current “solutions.” The company allows users to request deletion of their personal data and offers an opt-out from training. According to their data processing addendum, API customer data is retained for a maximum of thirty days before automatic deletion. Chat histories can be deleted, and conversations with chat history disabled are removed after thirty days.

But what does this actually accomplish? The data used to train GPT-4 and other models is already baked in. Deleting your account or opting out today doesn't retroactively remove your influence from models trained yesterday. It's like closing the stable door after the horse has not only bolted but has been cloned a million times and distributed globally.

This performative compliance extends across the industry. Companies implement deletion mechanisms that remove data from active databases while knowing full well that the same information persists in model weights, embeddings, and latent representations. They offer privacy dashboards and control panels that provide an illusion of agency while the underlying reality remains unchanged: once your data has been used to train a model, removing its influence is computationally intractable at scale.

The unlearning debate has collided head-on with copyright law in ways that nobody fully anticipated. When The New York Times filed its landmark lawsuit against OpenAI and Microsoft on 27 December 2023, it didn't just seek compensation—it demanded something far more radical: the complete destruction of all ChatGPT datasets containing the newspaper's copyrighted content. This extraordinary demand, if granted by federal judge Sidney Stein, would effectively require OpenAI to “untrain” its models, forcing the company to rebuild from scratch using only authorised content.

The Times' legal team believes their articles represent one of the largest sources of copyrighted text in ChatGPT's training data, with the latest GPT models trained on trillions of words. In March 2025, Judge Stein rejected OpenAI's motion to dismiss, allowing the copyright infringement claims to proceed to trial. The stakes are astronomical—the newspaper seeks “billions of dollars in statutory and actual damages” for what it calls the “unlawful copying and use” of its journalism.

But the lawsuit has exposed an even deeper conflict about data preservation and privacy. The Times has demanded that OpenAI “retain consumer ChatGPT and API customer data indefinitely”—a requirement that OpenAI argues “fundamentally conflicts with the privacy commitments we have made to our users.” This creates an extraordinary paradox: copyright holders demand permanent data retention for litigation purposes, while privacy advocates and regulations require data deletion. The two demands are mutually exclusive, yet both are being pursued through the courts simultaneously.

OpenAI's defence rests on the doctrine of “fair use,” with company lawyer Joseph Gratz arguing that ChatGPT “isn't a document retrieval system. It is a large language model.” The company maintains that regurgitating entire articles “is not what it is designed to do and not what it does.” Yet the Times has demonstrated instances where ChatGPT can reproduce substantial portions of its articles nearly verbatim—evidence that the model has indeed “memorised” copyrighted content.

This legal conflict has exposed a fundamental tension: copyright holders want their content removed from AI systems, while privacy advocates want personal information deleted. Both demands rest on the assumption that selective forgetting is technically feasible. Ken Liu's research at Stanford highlights this convergence: “The field has evolved from training small convolutional nets on face images to training giant language models on pay-walled, copyrighted, toxic, dangerous, and otherwise harmful content, all of which we may want to 'erase' from the ML models.”

But the technical mechanisms for copyright removal and privacy deletion are essentially the same—and equally problematic. You can't selectively lobotomise an AI any more than you can unbake that cake. The models that power ChatGPT, Claude, and other systems don't have a delete key for specific memories. They have patterns, weights, and associations distributed across billions of parameters, each one shaped by the entirety of their training data.

The implications extend far beyond The New York Times. Publishers worldwide are watching this case closely, as are AI companies that have built their business models on scraping the open web. If the Times succeeds in its demand for dataset destruction, it could trigger an avalanche of similar lawsuits that would fundamentally reshape the AI industry. Conversely, if OpenAI prevails with its fair use defence, it could establish a precedent that essentially exempts AI training from copyright restrictions—a outcome that would devastate creative industries already struggling with digital disruption.

The DAIR Perspective

Timnit Gebru, founder of the Distributed Artificial Intelligence Research Institute (DAIR), offers a different lens through which to view the unlearning problem. Since launching DAIR in December 2021 after her controversial departure from Google, Gebru has argued that the issue isn't just technical but structural. The concentration of AI development in a handful of massive corporations means that decisions about data use, model training, and deletion capabilities are made by entities with little accountability to the communities whose data they consume.

“One of the biggest issues in AI right now is exploitation,” Gebru noted in a 2024 interview. She points to content moderators in Nairobi earning as little as $1.50 per hour to clean training data for tech giants, and the millions of internet users whose creative output has been absorbed without consent or compensation. From this perspective, the inability to untrain models isn't a bug—it's a feature of systems designed to maximise data extraction while minimising accountability.

DAIR's research focuses on alternative approaches to AI development that prioritise community consent and local governance. Rather than building monolithic models trained on everything and owned by no one, Gebru advocates for smaller, purpose-specific systems where data provenance and deletion capabilities are built in from the start. It's a radically different vision from the current paradigm of ever-larger models trained on ever-more data.

The Contextual Integrity Problem

Helen Nissenbaum, the Andrew H. and Ann R. Tisch Professor at Cornell Tech and architect of the influential “contextual integrity” framework for privacy, brings yet another dimension to the unlearning debate. Her theory, which defines privacy not as secrecy but as appropriate information flow within specific contexts, suggests that the problem with AI isn't just that it can't forget—it's that it doesn't understand context in the first place.

“We say appropriate data flows serve the integrity of the context,” Nissenbaum explains. When someone shares information on a professional networking site, they have certain expectations about how that information will be used. When the same data gets scraped to train a general-purpose AI that might be used for anything from generating marketing copy to making employment decisions, those contextual boundaries are shattered.

Speaking at the 6th Annual Symposium on Applications of Contextual Integrity in September 2024, Nissenbaum argued that the massive scale of AI systems makes contextual appropriateness impossible to maintain. “Digital systems have been big for a while, but they've become more massive with AI, and even more so with generative AI. People feel an onslaught, and they may express their concern as, 'My privacy is violated.'”

The contextual integrity framework suggests that even perfect unlearning wouldn't solve the deeper problem: AI systems that treat all information as fungible training data, stripped of its social context and meaning. A medical record, a love letter, a professional résumé, and a casual tweet all become undifferentiated tokens in the training process. No amount of post-hoc deletion can restore the contextual boundaries that were violated in the collection and training phase.

The Hugging Face Approach

Margaret Mitchell, Chief Ethics Scientist at Hugging Face since late 2021, has been working on a different approach to the unlearning problem. Rather than trying to remove data from already-trained models, Mitchell's team focuses on governance and documentation practices that make models' limitations and training data transparent from the start.

Mitchell pioneered the concept of “Model Cards”—standardised documentation that accompanies AI models to describe their training data, intended use cases, and known limitations. This approach doesn't solve the unlearning problem, but it does something arguably more important: it makes visible what data went into a model and what biases or privacy risks might result.

“Open-source AI carries as many benefits, and as few harms, as possible,” Mitchell stated in her 2023 TIME 100 AI recognition. At Hugging Face, this philosophy translates into tools and practices that give users more visibility into and control over AI systems, even if perfect unlearning remains elusive. The platform's emphasis on reproducibility and transparency stands in stark contrast to the black-box approach of proprietary systems.

Mitchell's work on data governance at Hugging Face includes developing methods to track data provenance, identify potentially problematic training examples, and give model users tools to understand what information might be encoded in the systems they're using. While this doesn't enable true unlearning, it does enable informed consent and risk assessment—prerequisites for any meaningful privacy protection in the AI age.

The Technical Reality Check

Let's be brutally specific about why unlearning is so difficult. Modern large language models like GPT-4 contain hundreds of billions of parameters. Each parameter is influenced by millions or billions of training examples. The information about any individual training example isn't stored in any single location—it's diffused across the entire network in subtle statistical correlations.

Consider a simplified example: if a model was trained on text mentioning “Sarah Johnson, a doctor in Leeds,” that information doesn't exist as a discrete fact the model can forget. Instead, it slightly adjusts thousands of parameters governing associations between concepts like “Sarah,” “Johnson,” “doctor,” “Leeds,” and countless related terms. These adjustments influence how the model processes entirely unrelated text. Removing Sarah Johnson's influence would require identifying and reversing all these minute adjustments—without breaking the model's ability to understand that doctors exist in Leeds, that people named Sarah Johnson exist, or any of the other valid patterns learned from other sources.

Seth Neel's research at Harvard has produced some of the most rigorous work on this problem. His 2021 paper “Descent-to-Delete: Gradient-Based Methods for Machine Unlearning” demonstrated that even with complete access to a model's architecture and training process, selectively removing information is computationally expensive and often ineffective. His more recent work on “Adaptive Machine Unlearning” shows that the problem becomes exponentially harder as models grow larger and training datasets become more complex.

“The initial research explorations were primarily driven by Article 17 of GDPR since 2014,” notes Ken Liu in his comprehensive review of the field. “A decade later in 2024, user privacy is no longer the only motivation for unlearning.” The field has expanded to encompass copyright concerns, safety issues, and the removal of toxic or harmful content. Yet despite this broadened focus and increased research attention, the fundamental technical barriers remain largely unchanged.

The Computational Cost Crisis

Even if perfect unlearning were technically possible, the computational costs would be staggering. Training GPT-4 reportedly cost over $100 million in computational resources. Retraining the model to remove even a small amount of data would require similar resources. Now imagine doing this for every deletion request from millions of users.

The environmental implications are equally troubling. Training large AI models already consumes enormous amounts of energy, contributing significantly to carbon emissions. If companies were required to retrain models regularly to honour deletion requests, the environmental cost could be catastrophic. We'd be burning fossil fuels to forget information—a dystopian irony that highlights the unsustainability of current approaches.

Some researchers have proposed “sharding” approaches where models are trained on separate data partitions that can be individually retrained. But this introduces its own problems: reduced model quality, increased complexity, and the fundamental issue that information still leaks across shards through shared preprocessing, architectural choices, and validation procedures.

The Regulatory Reckoning

As 2025 unfolds, regulators worldwide are being forced to confront the gap between privacy law's promises and AI's technical realities. The European Data Protection Board's December 2024 opinion attempted to provide clarity but mostly highlighted the contradictions. The board suggested that legitimate interest might serve as a legal basis for AI training in some cases—such as cybersecurity or conversational agents—but only with strict necessity and rights balancing.

Yet the opinion also acknowledged that determining whether an AI model contains personal data requires case-by-case assessment by data protection authorities. Given the thousands of AI models being developed and deployed, this approach seems practically unworkable. It's like asking food safety inspectors to individually assess every grain of rice for contamination.

California's AB 1008 takes a different approach, simply declaring that AI models do contain personal information and must be subject to deletion rights. But the law provides little guidance on how companies should actually implement this requirement. The result is likely to be a wave of litigation as courts try to reconcile legal mandates with technical impossibilities.

The Italian Garante's €15 million fine against OpenAI in December 2024, announced just two days after the EDPB opinion, signals that European regulators are losing patience with technical excuses. The fine was accompanied by corrective measures requiring OpenAI to implement age verification and improve transparency about data processing. But notably absent was any requirement for true unlearning capabilities—perhaps a tacit acknowledgment that such requirements would be unenforceable.

The Adversarial Frontier

The unlearning problem becomes even more complex when we consider adversarial attacks. Research has repeatedly shown that even when models appear to have “forgotten” information, sophisticated prompting techniques can often extract it anyway. This isn't surprising—if the information has influenced the model's parameters, traces remain even after attempted deletion.

In 2024, researchers demonstrated that large language models could be prompted to regenerate verbatim text from their training data, even when companies claimed that data had been “forgotten.” These extraction attacks work because the information isn't truly gone—it's just harder to access through normal means. It's like shredding a document but leaving the shreds in a pile; with enough effort, the original can be reconstructed.

This vulnerability has serious implications for privacy and security. If deletion mechanisms can be circumvented through clever prompting, then compliance with privacy laws becomes meaningless. A company might honestly believe it has deleted someone's data, only to have that data extracted by a malicious actor using adversarial techniques.

The Innovation Imperative

Despite these challenges, innovation in unlearning continues at a breakneck pace. The NeurIPS 2023 Machine Unlearning Challenge, co-organised by Eleni Triantafillou and Fabian Pedregosa from Google DeepMind, attracted hundreds of submissions proposing novel approaches. The 2024 follow-up work, “Are we making progress in unlearning?” provides a sobering assessment: while techniques are improving, fundamental barriers remain.

Some of the most promising approaches involve building unlearning capabilities into models from the start, rather than trying to add them retroactively. This might mean architectural changes that isolate different types of information, training procedures that maintain deletion indexes, or hybrid systems that combine parametric models with retrievable databases.

But these solutions require starting over—something the industry seems reluctant to do given the billions already invested in current architectures. It's easier to promise future improvements than to acknowledge that existing systems are fundamentally incompatible with privacy rights.

The Alternative Futures

What if we accepted that true unlearning is impossible and designed systems accordingly? This might mean:

Expiring Models: AI systems that are automatically retrained on fresh data after a set period, with old versions retired. This wouldn't enable targeted deletion but would ensure that old information eventually ages out.

Federated Architectures: Instead of centralised models trained on everyone's data, federated systems where computation happens locally and only aggregated insights are shared. Apple's on-device Siri processing hints at this approach.

Purpose-Limited Systems: Rather than general-purpose models trained on everything, specialised systems trained only on consented, contextually appropriate data. This would mean many more models but much clearer data governance.

Retrieval-Augmented Generation: Systems that separate the knowledge base from the language model, allowing for targeted updates to the retrievable information while keeping the base model static.

Each approach has trade-offs. Expiring models waste computational resources. Federated systems can be less capable. Purpose-limited systems reduce flexibility. Retrieval augmentation can be manipulated. There's no perfect solution, only different ways of balancing capability against privacy.

The Trust Deficit

Perhaps the deepest challenge isn't technical but social: the erosion of trust between AI companies and the public. When OpenAI claims to delete user data while knowing that information persists in model weights, when Google promises privacy controls that don't actually control anything, when Meta talks about user choice while training on decades of social media posts—the gap between rhetoric and reality becomes a chasm.

This trust deficit has real consequences. EU regulators are considering increasingly stringent requirements. California's legislation is likely just the beginning of state-level action in the US. China is developing its own AI governance framework with potentially strict data localisation requirements. The result could be a fragmented global AI landscape where models can't be deployed across borders.

Margaret Mitchell at Hugging Face argues that rebuilding trust requires radical transparency: “We need to document not just what data went into models, but what data can't come out. We need to be honest about limitations, clear about capabilities, and upfront about trade-offs.”

The Human Cost

Behind every data point in an AI training set is a human being. Someone wrote that blog post, took that photo, composed that email. When we talk about the impossibility of unlearning, we're really talking about the impossibility of giving people control over their digital selves.

Consider the practical implications. A teenager's embarrassing social media posts from years ago, absorbed into training data, might influence AI systems for decades. A writer whose work was scraped without permission watches as AI systems generate derivative content, with no recourse for removal. A patient's medical forum posts, intended to help others with similar conditions, become part of systems used by insurance companies to assess risk.

Timnit Gebru's DAIR Institute has documented numerous cases where AI training has caused direct harm to individuals and communities. “The model fits all doesn't work,” Gebru argues. “It is a fictional argument that feeds a monoculture on tech and a tech monopoly.” Her research shows that the communities most likely to be harmed by AI systems—marginalised groups, Global South populations, minority language speakers—are also least likely to have any say in how their data is used.

The Global Fragmentation Crisis

The impossibility of AI unlearning is creating a regulatory Tower of Babel. Different jurisdictions are adopting fundamentally incompatible approaches to the same problem, threatening to fragment the global AI landscape into isolated regional silos.

In the United States, California's AB 1008 represents just the beginning. Other states are drafting their own AI privacy laws, each with different definitions of what constitutes personal information in an AI context and different requirements for deletion. Texas is considering legislation that would require AI companies to maintain “deletion capabilities” without defining what that means technically. New York's proposed AI accountability act includes provisions for “algorithmic discrimination audits” that would require examining how models treat different demographic groups—impossible without access to the very demographic data that privacy laws say should be deleted.

The European Union, meanwhile, is developing the AI Act alongside GDPR, creating a dual regulatory framework that companies must navigate. The December 2024 EDPB opinion suggests that models might be considered anonymous if they meet certain criteria, but member states are interpreting these criteria differently. France's CNIL has taken a relatively permissive approach, while Germany's data protection authorities demand stricter compliance. The Hamburg DPA's position that LLMs don't contain personal data at all stands in stark opposition to Ireland's DPA, which requested the EDPB opinion precisely because it believes they do.

China is developing its own approach, focused less on individual privacy rights and more on data sovereignty and national security. The Cyberspace Administration of China has proposed regulations requiring that AI models trained on Chinese citizens' data must store that data within China and provide government access for “security reviews.” This creates yet another incompatible framework that would require completely separate models for the Chinese market.

The result is a nightmare scenario for AI developers: models that are legal in one jurisdiction may be illegal in another, not because of their outputs but because of their fundamental architecture. A model trained to comply with California's deletion requirements might violate China's data localisation rules. A system designed for GDPR compliance might fail to meet emerging requirements in India or Brazil.

The Path Forward

So where does this leave us? The technical reality is clear: true unlearning in large AI models is currently impossible and likely to remain so with existing architectures. The legal landscape is fragmenting as different jurisdictions take incompatible approaches. The trust between companies and users continues to erode.

Yet this isn't cause for despair but for action. Acknowledging the impossibility of unlearning with current technology should spur us to develop new approaches, not to abandon privacy rights. This might mean:

Regulatory Honesty: Laws that acknowledge technical limitations while still holding companies accountable for data practices. This could include requirements for transparency, consent, and purpose limitation even if deletion isn't feasible. Rather than demanding the impossible, regulations could focus on preventing future misuse of data already embedded in models.

Technical Innovation: Continued research into architectures that enable better data governance, even if perfect unlearning remains elusive. The work of researchers like Seth Neel, Eleni Triantafillou, and Ken Liu shows that progress, while slow, is possible. New architectures might include built-in “forgetfulness” through techniques like differential privacy or temporal degradation of weights.

Social Negotiation: Broader conversations about what we want from AI systems and what trade-offs we're willing to accept. Helen Nissenbaum's contextual integrity framework provides a valuable lens for these discussions. We need public forums where technologists, ethicists, policymakers, and citizens can wrestle with these trade-offs together.

Alternative Models: Support for organisations like DAIR that are exploring fundamentally different approaches to AI development, ones that prioritise community governance over scale. This might mean funding for public AI infrastructure, support for cooperative AI development models, or requirements that commercial AI companies contribute to public AI research.

Harm Mitigation: Since we can't remove data from trained models, we should focus on preventing and mitigating harms from that data's presence. This could include robust output filtering, use-case restrictions, audit requirements, and liability frameworks that hold companies accountable for harms caused by their models' outputs rather than their training data.

The promise that AI can forget your data is, at present, an impossible one. But impossible promises have a way of driving innovation. The question isn't whether AI will ever truly be able to forget—it's whether we'll develop systems that make forgetting unnecessary by respecting privacy from the start.

As we stand at this crossroads, the choices we make will determine not just the future of privacy but the nature of the relationship between humans and artificial intelligence. Will we accept systems that absorb everything and forget nothing, or will we demand architectures that respect the human need for privacy, context, and control?

The answer won't come from Silicon Valley boardrooms or Brussels regulatory chambers alone. It will emerge from the collective choices of developers, regulators, researchers, and users worldwide. The impossible promise of AI unlearning might just be the catalyst we need to reimagine what artificial intelligence could be—not an omniscient oracle that never forgets, but a tool that respects the very human need to be forgotten.


References and Further Information

Academic Publications

  • Binns, R. (2024). “Privacy, Data Protection, and AI Governance.” Oxford University Computer Science Department.
  • Liu, K.Z. (2024). “Machine Unlearning in 2024.” Stanford Computer Science Blog, May 2024.
  • Mitchell, M., et al. (2023). “Model Cards for Model Reporting.” Hugging Face Research.
  • Neel, S., et al. (2021). “Descent-to-Delete: Gradient-Based Methods for Machine Unlearning.” Algorithmic Learning Theory Conference.
  • Nissenbaum, H. (2024). “Contextual Integrity: From Privacy to Data Governance.” Cornell Tech.
  • Triantafillou, E., et al. (2024). “Are we making progress in unlearning? Findings from the first NeurIPS unlearning competition.”

Regulatory Documents

  • California State Legislature. (2024). Assembly Bill 1008: California Consumer Privacy Act Amendments.
  • European Data Protection Board. (2024). Opinion 28/2024 on certain data protection aspects related to the processing of personal data in the context of AI models. 18 December 2024.
  • Italian Data Protection Authority (Garante). (2024). OpenAI Fine and Corrective Measures. December 2024.

Institutional Reports

  • DAIR Institute. (2024). “Alternative Approaches to AI Development.” Distributed AI Research Institute.
  • Harvard Business School. (2024). “Machine Unlearning and the Right to be Forgotten.” Working Knowledge.
  • Hugging Face. (2024). “Open Source AI Governance and Ethics.” Annual Report.

News and Analysis

  • TIME Magazine. (2023). “The 100 Most Influential People in AI 2023.”
  • WIRED. (2024). Various articles on AI privacy and machine unlearning.
  • TechPolicy.Press. (2024). “The Right to Be Forgotten Is Dead: Data Lives Forever in AI.”

Tim Green

Tim Green UK-based Systems Theorist & Independent Technology Writer

Tim explores the intersections of artificial intelligence, decentralised cognition, and posthuman ethics. His work, published at smarterarticles.co.uk, challenges dominant narratives of technological progress while proposing interdisciplinary frameworks for collective intelligence and digital stewardship.

His writing has been featured on Ground News and shared by independent researchers across both academic and technological communities.

ORCID: 0009-0002-0156-9795 Email: tim@smarterarticles.co.uk

Discuss...

#HumanInTheLoop #DataPrivacy #AIUnlearning #DigitalTrust

Your smartphone buzzes with a gentle notification: “Taking the bus instead of driving today would save 2.3kg of CO2 and improve your weekly climate score by 12%.” Another ping suggests swapping beef for lentils at dinner, calculating the precise environmental impact down to water usage and methane emissions. This isn't science fiction—it's the emerging reality of AI-powered personal climate advisors, digital systems that promise to optimise every aspect of our daily lives for environmental benefit. But as these technologies embed themselves deeper into our routines, monitoring our movements, purchases, and choices with unprecedented granularity, a fundamental question emerges: are we witnessing the birth of a powerful tool for environmental salvation, or the construction of a surveillance infrastructure that could fundamentally alter the relationship between individuals and institutions?

The Promise of Personalised Environmental Intelligence

The concept of a personal climate advisor represents a seductive fusion of environmental consciousness and technological convenience. These systems leverage vast datasets to analyse individual behaviour patterns, offering real-time guidance that could theoretically transform millions of small daily decisions into collective environmental action. The appeal is immediate and tangible—imagine receiving precise, personalised recommendations that help you reduce your carbon footprint without sacrificing convenience or quality of life.

Early iterations of such technology already exist in various forms. Apps track the carbon footprint of purchases, suggesting lower-impact alternatives. Smart home systems optimise energy usage based on occupancy patterns and weather forecasts. Transportation apps recommend the most environmentally friendly routes, factoring in real-time traffic data, public transport schedules, and vehicle emissions. These scattered applications hint at a future where a unified AI system could orchestrate all these decisions seamlessly.

The environmental potential is genuinely compelling. Individual consumer choices account for a significant portion of global greenhouse gas emissions, from transportation and housing to food and consumption patterns. If AI systems could nudge millions of people towards more sustainable choices—encouraging public transport over private vehicles, plant-based meals over meat-heavy diets, or local produce over imported goods—the cumulative impact could be substantial. The technology promises to make environmental responsibility effortless, removing the cognitive burden of constantly calculating the climate impact of every decision.

Moreover, these systems could democratise access to environmental knowledge that has traditionally been the preserve of specialists. Understanding the true climate impact of different choices requires expertise in lifecycle analysis, supply chain emissions, and complex environmental science. A personal climate advisor could distil this complexity into simple, actionable guidance, making sophisticated environmental decision-making accessible to everyone regardless of their technical background.

The data-driven approach also offers the possibility of genuine personalisation. Rather than one-size-fits-all environmental advice, these systems could account for individual circumstances, local infrastructure, and personal constraints. A recommendation system might recognise that someone living in a rural area with limited public transport faces different challenges than an urban dweller with extensive transit options. It could factor in income constraints, dietary restrictions, or mobility limitations, offering realistic advice rather than idealistic prescriptions.

The Machinery of Monitoring

However, the infrastructure required to deliver such personalised environmental guidance necessitates an unprecedented level of personal surveillance. To provide meaningful recommendations about commuting choices, the system must know where you live, work, and travel. To advise on grocery purchases, it needs access to your shopping habits, dietary preferences, and consumption patterns. To optimise your energy usage, it requires detailed information about your home, your schedule, and your daily routines.

This data collection extends far beyond simple preference tracking. Modern data analytics systems are designed to analyse customer trends and monitor shopping behaviour with extraordinary granularity, and in the context of a climate advisor, this monitoring would encompass virtually every aspect of daily life that has an environmental impact—which is to say, virtually everything. The system would need to know not just what you buy, but when, where, and why. It would track your movements, your energy consumption, your waste production, and your consumption patterns across multiple categories. The sophistication of modern data analytics means that even seemingly innocuous information can reveal sensitive details about personal life. Shopping patterns can indicate health conditions, relationship status, financial circumstances, and political preferences. Location data reveals not just where you go, but who you visit, how long you stay, and what your daily routines look like. Energy usage patterns can indicate when you're home, when you're away, and even how many people live in your household.

The technical requirements for such comprehensive monitoring are already within reach. Smartphones provide location data with metre-level precision. Credit card transactions reveal purchasing patterns. Smart home devices monitor energy usage in real-time. Social media activity offers insights into preferences and intentions. Loyalty card programmes track shopping habits across retailers. When integrated, these data streams create a remarkably detailed picture of individual environmental impact.

This comprehensive monitoring capability raises immediate questions about privacy and consent. While users might willingly share some information in exchange for environmental guidance, the full scope of data collection required for effective climate advice might not be immediately apparent. The gradual expansion of monitoring capabilities—what privacy researchers call “function creep”—could see systems that begin with simple carbon tracking evolving into comprehensive lifestyle surveillance platforms.

The Commercial Imperative and Data Foundation

The development of personal climate advisors is unlikely to occur in a vacuum of pure environmental altruism. These systems require substantial investment in technology, data infrastructure, and ongoing maintenance. The economic model for sustaining such services inevitably involves commercial considerations that may not always align with optimal environmental outcomes.

At its core, any AI-driven climate advisor is fundamentally powered by data analytics. The ability to process raw data to identify trends and inform strategy is the mechanism that enables an AI system to optimise a user's environmental choices. This foundation in data analytics brings both opportunities and risks that shape the entire climate advisory ecosystem. The power of data analytics lies in its ability to identify patterns and correlations that would be invisible to human analysis. In the environmental context, this could mean discovering unexpected connections between seemingly unrelated choices, identifying optimal timing for different sustainable behaviours, or recognising personal patterns that indicate opportunities for environmental improvement.

However, data analytics is fundamentally designed to increase revenue and target marketing initiatives for businesses. A personal climate advisor, particularly one developed by a commercial entity, faces inherent tensions between providing the most environmentally beneficial advice and generating revenue through partnerships, advertising, or data monetisation. The system might recommend products or services from companies that have paid for preferred placement, even if alternative options would be more environmentally sound.

Consider the complexity of food recommendations. A truly objective climate advisor might suggest reducing meat consumption, buying local produce, and minimising packaged foods. However, if the system is funded by partnerships with major food retailers or manufacturers, these recommendations might be subtly influenced by commercial relationships. The advice might steer users towards “sustainable” products from partner companies rather than the most environmentally beneficial options available.

The business model for data monetisation adds another layer of complexity. Personal climate advisors would generate extraordinarily valuable datasets about consumer behaviour, preferences, and environmental consciousness. This information could be highly sought after by retailers, manufacturers, advertisers, and other commercial entities. The temptation to monetise this data—either through direct sales or by using it to influence user behaviour for commercial benefit—could compromise the system's environmental mission.

Furthermore, the competitive pressure to provide engaging, user-friendly advice might lead to recommendations that prioritise convenience and user satisfaction over maximum environmental benefit. A system that consistently recommends difficult or inconvenient choices might see users abandon the platform in favour of more accommodating alternatives. This market pressure could gradually erode the environmental effectiveness of the advice in favour of maintaining user engagement.

The same analytical power that enables sophisticated environmental guidance also creates the potential for manipulation and control. Data analytics systems are designed to influence behaviour, and the line between helpful guidance and manipulative nudging can be difficult to discern. The environmental framing may make users more willing to accept behavioural influence that they would resist in other contexts.

The quality and completeness of the underlying data also fundamentally shapes the effectiveness and fairness of climate advisory systems. If the data used to train these systems is biased, incomplete, or unrepresentative, the resulting advice will perpetuate and amplify these limitations. Ensuring data quality and representativeness is crucial for creating climate advisors that serve all users fairly and effectively.

The Embedded Values Problem

The promise of objective, data-driven environmental advice masks the reality that all AI systems embed human values and assumptions. A personal climate advisor would inevitably reflect the perspectives, priorities, and prejudices of its creators, potentially perpetuating or amplifying existing inequalities under the guise of environmental optimisation.

Extensive research on bias and fairness in automated decision-making systems demonstrates how AI technologies can systematically disadvantage certain groups while appearing to operate objectively. Studies of hiring systems, credit scoring systems, and criminal justice risk assessment tools have revealed consistent patterns of discrimination that reflect and amplify societal biases. In the context of climate advice, this embedded bias could manifest in numerous problematic ways.

The system might penalise individuals who live in areas with limited public transport options, poor access to sustainable food choices, or inadequate renewable energy infrastructure. People with lower incomes might find themselves consistently rated as having worse environmental performance simply because they cannot afford electric vehicles, organic food, or energy-efficient housing. This creates a feedback loop where environmental virtue becomes correlated with economic privilege rather than genuine environmental commitment.

Geographic bias represents a particularly troubling possibility. Urban dwellers with access to extensive public transport networks, bike-sharing systems, and diverse food markets might consistently receive higher environmental scores than rural residents who face structural limitations in their sustainable choices. The system could inadvertently create a hierarchy of environmental virtue that correlates with privilege rather than genuine environmental commitment.

Cultural and dietary biases could also emerge in food recommendations. A system trained primarily on Western consumption patterns might consistently recommend against traditional diets from other cultures, even when those diets are environmentally sustainable. Religious or cultural dietary restrictions might be treated as obstacles to environmental performance rather than legitimate personal choices that should be accommodated within sustainable living advice.

The system's definition of environmental optimisation itself embeds value judgements that might not be universally shared. Should the focus be on carbon emissions, biodiversity impact, water usage, or waste generation? Different environmental priorities could lead to conflicting recommendations, and the system's choices about which factors to emphasise would reflect the values and assumptions of its designers rather than objective environmental science.

Income-based discrimination represents perhaps the most concerning form of bias in this context. Many of the most environmentally friendly options—electric vehicles, organic food, renewable energy systems, energy-efficient appliances—require significant upfront investment that may be impossible for lower-income individuals. A climate advisor that consistently recommends expensive sustainable alternatives could effectively create a system where environmental virtue becomes a luxury good, accessible only to those with sufficient disposable income.

The Surveillance Infrastructure

The comprehensive monitoring required for effective climate advice creates an infrastructure that could easily be repurposed for broader surveillance and control. Once systems exist to track individual movements, purchases, energy usage, and consumption patterns, the technical barriers to expanding that monitoring for other purposes become minimal. Experts explicitly voice concerns that a more tech-driven world will lead to rising authoritarianism, and a personal climate advisor provides an almost perfect mechanism for such control.

The environmental framing of such surveillance makes it particularly insidious. Unlike overtly authoritarian monitoring systems, a climate advisor positions surveillance as virtuous and voluntary. Users might willingly accept comprehensive tracking in the name of environmental responsibility, gradually normalising levels of monitoring that would be rejected if presented for other purposes. The environmental mission provides moral cover for surveillance infrastructure that could later be expanded or repurposed.

The integration of climate monitoring with existing digital infrastructure amplifies these concerns. Smartphones, smart home devices, payment systems, and social media platforms already collect vast amounts of personal data. A climate advisor would provide a framework for integrating and analysing this information in new ways, creating a more complete picture of individual behaviour than any single system could achieve alone.

The potential for mission creep is substantial. A system that begins by tracking carbon emissions could gradually expand to monitor other aspects of behaviour deemed relevant to environmental impact. Social activities, travel patterns, consumption choices, and even personal relationships could all be justified as relevant to environmental monitoring. The definition of environmentally relevant behaviour could expand to encompass virtually any aspect of personal life.

Government integration represents another significant risk. Climate change is increasingly recognised as a national security issue, and governments might seek access to climate monitoring data for policy purposes. A system designed to help individuals reduce their environmental impact could become a tool for enforcing environmental regulations, monitoring compliance with climate policies, or identifying individuals for targeted intervention.

The Human-AI Co-evolution Factor

The success of personal climate advisors will ultimately depend on how well they are designed to interact with human emotional and cognitive states. Research on human-AI co-evolution suggests that the most effective AI systems are those that complement rather than replace human decision-making capabilities. In the context of climate advice, this means creating systems that enhance human environmental awareness and motivation rather than simply automating environmental choices.

The psychological aspects of environmental behaviour change are complex and often counterintuitive. People may intellectually understand the importance of reducing their carbon footprint while struggling to translate that understanding into consistent behavioural change. Effective climate advisors would need to account for these psychological realities, providing guidance that works with human nature rather than against it.

The design of these systems will also need to consider the broader social and cultural contexts in which they operate. Environmental behaviour is not just an individual choice but a social phenomenon influenced by community norms, cultural values, and social expectations. Climate advisors that ignore these social dimensions may struggle to achieve lasting behaviour change, regardless of their technical sophistication.

The concept of humans and AI evolving together establishes the premise that AI will increasingly influence human cognition and interaction with our surroundings. This co-evolution could lead to more intuitive and effective climate advisory systems that understand human motivations and constraints. However, it also raises questions about how this technological integration might change human agency and decision-making autonomy.

Successful human-AI co-evolution in the climate context would require systems that respect human values, cultural differences, and individual circumstances while providing genuinely helpful environmental guidance. This balance is technically challenging but essential for creating climate advisors that serve human flourishing rather than undermining it.

Expert Perspectives and Future Scenarios

The expert community remains deeply divided about the net impact of advancing AI and data analytics technologies. While some foresee improvements and positive human-AI co-evolution, a significant plurality fears that technological advancement will make life worse for most people. This fundamental disagreement among experts reflects the genuine uncertainty about how personal climate advisors and similar systems will ultimately impact society. The post-pandemic “new normal” is increasingly characterised as far more tech-driven, creating a “tele-everything” world where digital systems mediate more aspects of daily life. This trend makes the adoption of personal AI advisors for various aspects of life, including climate impact, increasingly plausible and likely.

The optimistic scenario envisions AI systems that genuinely empower individuals to make better environmental choices while respecting privacy and autonomy. These systems would provide personalised, objective advice that helps users navigate complex environmental trade-offs without imposing surveillance or control. They would democratise access to environmental expertise, making sustainable living easier and more accessible for everyone regardless of income, location, or technical knowledge.

The pessimistic scenario sees climate advisors as surveillance infrastructure disguised as environmental assistance. These systems would gradually normalise comprehensive monitoring of personal behaviour, creating data resources that could be exploited by corporations, governments, or other institutions for purposes far removed from environmental protection. The environmental mission would serve as moral cover for the construction of unprecedented surveillance capabilities.

The most likely outcome probably lies between these extremes, with climate advisory systems delivering some genuine environmental benefits while also creating new privacy and surveillance risks. The balance between these outcomes will depend on the specific design choices, governance frameworks, and social responses that emerge as these technologies develop.

The international dimension adds another layer of complexity. Different countries and regions are likely to develop different approaches to climate advisory systems, reflecting varying cultural attitudes towards privacy, environmental protection, and government authority. This diversity could create opportunities for learning and improvement, but it could also lead to a fragmented landscape where users in different jurisdictions have very different experiences with climate monitoring.

The trajectory towards more tech-driven environmental monitoring appears inevitable, but the inevitability of technological development does not predetermine its social impact. The same technologies that could enable comprehensive environmental surveillance could also empower individuals to make more informed, sustainable choices while maintaining privacy and autonomy.

The Governance Challenge

The fundamental question surrounding personal climate advisors is not whether the technology is possible—it clearly is—but whether it can be developed and deployed in ways that maximise environmental benefits while minimising surveillance risks. This challenge is primarily one of governance rather than technology.

The difference between a positive outcome that delivers genuine environmental improvements and a negative one that enables authoritarian control depends on human choices regarding ethics, privacy, and institutional design. The technology itself is largely neutral; its impact will be determined by the frameworks, regulations, and safeguards that govern its development and use.

Transparency represents a crucial element of responsible governance. Users need clear, comprehensible information about what data is being collected, how it is being used, and who has access to it. The complexity of modern data analytics makes this transparency challenging to achieve, but it is essential for maintaining user agency and preventing the gradual erosion of privacy under the guise of environmental benefit.

Data ownership and control mechanisms are equally important. Users should retain meaningful control over their environmental data, including the ability to access, modify, and delete information about their behaviour. The system should provide granular privacy controls that allow users to participate in climate advice while limiting data sharing for other purposes.

Independent oversight and auditing could help ensure that climate advisors operate in users' environmental interests rather than commercial or institutional interests. Regular audits of recommendation systems, data usage practices, and commercial partnerships could help identify and correct biases or conflicts of interest that might compromise the system's environmental mission.

Accountability measures could address concerns about bias and discrimination. Climate advisors should be required to demonstrate that their recommendations do not systematically disadvantage particular groups or communities. The systems should be designed to account for structural inequalities in access to sustainable options rather than penalising individuals for circumstances beyond their control.

Interoperability and user choice could prevent the emergence of monopolistic climate advisory platforms that concentrate too much power in single institutions. Users should be able to choose between different advisory systems, switch providers, or use multiple systems simultaneously. This competition could help ensure that climate advisors remain focused on user benefit rather than institutional advantage.

Concrete safeguards should include: mandatory audits for bias and fairness; user rights to data portability and deletion; prohibition on selling personal environmental data to third parties; requirements for human oversight of automated recommendations; regular public reporting on system performance and user outcomes.

These measures would create a framework for responsible development and deployment of climate advisory systems, establishing legal liability for discriminatory or harmful advice while ensuring that environmental benefits are achieved without sacrificing individual rights or democratic values.

The Environmental Imperative

The urgency of climate change adds complexity to the surveillance versus environmental benefit calculation. The scale and speed of environmental action required to address climate change might justify accepting some privacy risks in exchange for more effective environmental behaviour change. If personal climate advisors could significantly accelerate the adoption of sustainable practices across large populations, the environmental benefits might outweigh surveillance concerns.

However, this utilitarian calculation is complicated by questions about effectiveness and alternatives. There is limited evidence that individual behaviour change, even if optimised through AI systems, can deliver the scale of environmental improvement required to address climate change. Many experts argue that systemic changes in energy infrastructure, industrial processes, and economic systems are more important than individual consumer choices.

The focus on personal climate advisors might also represent a form of environmental misdirection, shifting attention and responsibility away from institutional and systemic changes towards individual behaviour modification. If climate advisory systems become a substitute for more fundamental environmental reforms, they could actually impede progress on climate change while creating new surveillance infrastructure.

The environmental framing of surveillance also risks normalising monitoring for other purposes. Once comprehensive personal tracking becomes acceptable for environmental reasons, it becomes easier to justify similar monitoring for health, security, economic, or other policy goals. The environmental mission could serve as a gateway to broader surveillance infrastructure that extends far beyond climate concerns.

It's important to acknowledge that many sustainable choices currently require significant financial resources, but policy interventions could help address these barriers. Government subsidies for electric vehicles, renewable energy installations, and energy-efficient appliances could make sustainable options more accessible. Carbon pricing mechanisms could make environmentally harmful choices more expensive while generating revenue for environmental programmes. Public investment in sustainable infrastructure—public transport, renewable energy grids, and local food systems—could expand access to sustainable choices regardless of individual income levels.

These policy tools suggest that the apparent trade-off between environmental effectiveness and surveillance might be a false choice. Rather than relying on comprehensive personal monitoring to drive behaviour change, societies could create structural conditions that make sustainable choices easier, cheaper, and more convenient for everyone.

The Competitive Landscape

The development of personal climate advisors is likely to occur within a competitive marketplace where multiple companies and organisations vie for user adoption and market share. This competitive dynamic will significantly influence the features, capabilities, and business models of these systems, with important implications for both environmental effectiveness and privacy protection.

Competition could drive innovation and improvement in climate advisory systems, pushing developers to create more accurate, useful, and user-friendly environmental guidance. Market pressure might encourage the development of more sophisticated personalisation capabilities, better integration with existing digital infrastructure, and more effective behaviour change mechanisms. However, large technology companies with existing data collection capabilities and user bases may have significant advantages in developing comprehensive climate advisors. This could lead to market concentration that gives a few companies disproportionate influence over how millions of people think about and act on environmental issues.

The competitive pressure to provide engaging, user-friendly advice might lead to recommendations that prioritise convenience and user satisfaction over maximum environmental benefit. A system that consistently recommends difficult or inconvenient choices might see users abandon the platform in favour of more accommodating alternatives. This market pressure could gradually erode the environmental effectiveness of the advice in favour of maintaining user engagement.

The market dynamics will ultimately determine whether climate advisory systems serve genuine environmental goals or become vehicles for data collection and behavioural manipulation. The challenge is ensuring that competitive forces drive innovation towards better environmental outcomes rather than more effective surveillance and control mechanisms.

The Path Forward

A rights-based approach to climate advisory development could help ensure that environmental benefits are achieved without sacrificing individual privacy or autonomy. This might involve treating environmental data as a form of personal information that deserves special protection, requiring explicit consent for collection and use, and providing strong user control over how the information is shared and applied.

Decentralised architectures could reduce surveillance risks while maintaining environmental benefits. Rather than centralising all climate data in single platforms controlled by corporations or governments, distributed systems could keep personal information under individual control while still enabling collective environmental action. Blockchain technologies, federated learning systems, and other decentralised approaches could provide environmental guidance without creating comprehensive surveillance infrastructure.

Open-source development could increase transparency and accountability in climate advisory systems. If the recommendation systems, data models, and guidance mechanisms are open to public scrutiny, it becomes easier to identify biases, conflicts of interest, or privacy violations. Open development could also enable community-driven climate advisors that prioritise environmental and social benefit over commercial interests.

Public sector involvement could help ensure that climate advisors serve broader social interests rather than narrow commercial goals. Government-funded or non-profit climate advisory systems might be better positioned to provide objective environmental advice without the commercial pressures that could compromise privately developed systems. However, public sector involvement also raises concerns about government surveillance and control that would need to be carefully managed.

The challenge is to harness the environmental potential of AI-powered climate advice while preserving the privacy, autonomy, and democratic values that define free societies. This will require careful attention to system design, robust governance frameworks, and ongoing vigilance about the balance between environmental benefits and surveillance risks.

Conclusion: The Buzz in Your Pocket

As we stand at this crossroads, the stakes are high: we have the opportunity to create powerful tools for environmental action, but we also risk building the infrastructure for a surveillance state in the name of saving the planet. The path forward requires acknowledging both the promise and the peril of personal climate advisors, working to maximise their environmental benefits while minimising their surveillance risks. This is not a technical challenge but a social one, requiring thoughtful choices about the kind of future we want to build and the values we want to preserve as we navigate the climate crisis.

The question is not whether we can create AI systems that monitor our environmental choices—we clearly can—but whether we can do so in ways that serve human flourishing rather than undermining it. The choice between environmental empowerment and surveillance infrastructure lies in human decisions about governance, accountability, and rights protection rather than in the technology itself.

Your smartphone will buzz again tomorrow with another gentle notification, another suggestion for reducing your environmental impact. The question that lingers is not what the message will say, but who will ultimately control the finger that presses send—and whether that gentle buzz represents the sound of environmental progress or the quiet hum of surveillance infrastructure embedding itself ever deeper into the fabric of daily life. In that moment of notification, in that brief vibration in your pocket, lies the entire tension between our environmental future and our digital freedom.


References and Further Information

  1. Pew Research Center. “Improvements ahead: How humans and AI might evolve together in the next decade.” Available at: www.pewresearch.org

  2. Pew Research Center. “Experts Say the 'New Normal' in 2025 Will Be Far More Tech-Driven, Presenting More Big Challenges.” Available at: www.pewresearch.org

  3. National Center for Biotechnology Information. “Reskilling and Upskilling the Future-ready Workforce for Industry 4.0 and Beyond.” Available at: pmc.ncbi.nlm.nih.gov

  4. Barocas, Solon, and Andrew D. Selbst. “Big Data's Disparate Impact.” California Law Review 104, no. 3 (2016): 671-732.

  5. O'Neil, Cathy. “Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy.” Crown Publishing Group, 2016.

  6. Zuboff, Shoshana. “The Age of Surveillance Capitalism: The Fight for a Human Future at the New Frontier of Power.” PublicAffairs, 2019.

  7. European Union Agency for Fundamental Rights. “Data Quality and Artificial Intelligence – Mitigating Bias and Error to Protect Fundamental Rights.” Publications Office of the European Union, 2019.

  8. Binns, Reuben. “Fairness in Machine Learning: Lessons from Political Philosophy.” Proceedings of Machine Learning Research 81 (2018): 149-159.

  9. Lyon, David. “Surveillance Capitalism, Surveillance Culture and Data Politics.” In “Data Politics: Worlds, Subjects, Rights,” edited by Didier Bigo, Engin Isin, and Evelyn Ruppert. Routledge, 2019.


Tim Green

Tim Green UK-based Systems Theorist & Independent Technology Writer

Tim explores the intersections of artificial intelligence, decentralised cognition, and posthuman ethics. His work, published at smarterarticles.co.uk, challenges dominant narratives of technological progress while proposing interdisciplinary frameworks for collective intelligence and digital stewardship.

His writing has been featured on Ground News and shared by independent researchers across both academic and technological communities.

ORCID: 0000-0002-0156-9795 Email: tim@smarterarticles.co.uk

Discuss...

#HumanInTheLoop #EnvironmentalSurveillance #SmartTechnologyEthics #DataPrivacy