Your Face Is Not a Novel: The Moral Divide in AI Training

When Italy's data protection authority, the Garante per la protezione dei dati personali, slapped OpenAI with a 15 million euro fine in December 2024, the charges had nothing to do with copyright infringement. The regulator found that OpenAI had trained ChatGPT on users' personal data without establishing a proper legal basis, failed to provide adequate transparency about how that data was processed, and neglected to report a data breach that exposed the chat histories and payment information of 440 Italian users. The privacy notice had been available only in English, and no notice whatsoever had been provided to non-users whose data was processed for training purposes. Beyond the fine, OpenAI was ordered to conduct a six-month information campaign across Italian media platforms to educate the public about how ChatGPT collects and uses data. OpenAI called the decision “disproportionate” and announced it would appeal.
Meanwhile, just six months later, in a completely separate legal arena, U.S. District Judge William Alsup ruled in Bartz v. Anthropic that using copyrighted books to train an AI model was “transformative, spectacularly so,” and therefore constituted fair use under American copyright law. The case resulted in a 1.5 billion dollar settlement, with Anthropic's funding scheduled in four instalments beginning with 300 million dollars by October 2025.
These two events, unfolding on different continents under different legal frameworks, illustrate a tension that sits at the heart of the generative AI revolution. The question is no longer simply whether AI companies should be allowed to hoover up the world's information to train their models. It is whether there should be a fundamental distinction between two very different categories of that information: published creative works (novels, journalism, photographs, music) and personal data (the digital traces of individual human lives). The law currently treats these categories through entirely separate regulatory regimes, and for good reason. But the AI industry has a habit of collapsing that distinction, treating all data as training fodder regardless of its nature or provenance. Understanding why this matters, and what to do about it, is one of the most consequential policy challenges of our time.
Two Legal Worlds Colliding
The distinction between published works and personal data is not some abstract philosophical nicety. It is baked into the legal architecture of every major democratic jurisdiction, reflecting fundamentally different values and harms.
Copyright law protects the economic and moral interests of creators. When The New York Times sued OpenAI in December 2023, alleging that millions of copyrighted articles had been used to train ChatGPT without consent or payment, the core claim was about intellectual property theft. The newspaper argued that OpenAI's models could reproduce substantial portions of its journalism, effectively creating a substitute for the original product. In March 2025, Judge Sidney Stein rejected OpenAI's motion to dismiss, allowing the main copyright infringement claims to proceed. By January 2026, the court ordered OpenAI to produce 20 million ChatGPT output logs as part of discovery, a ruling that could expose the degree to which the model regurgitates copyrighted material. The case has been consolidated with lawsuits from The New York Daily News and the Centre for Investigative Reporting, forming one of the most significant copyright challenges the technology industry has ever faced.
Data protection law, by contrast, protects something more intimate: the informational autonomy of individuals. The European Union's General Data Protection Regulation (GDPR) does not ask whether data is “creative” or “original.” It asks whether data can identify, or be linked to, a specific human being. Under the GDPR, organisations must establish a lawful basis for processing personal data at every stage of AI development and deployment. The European Data Protection Board (EDPB) adopted an opinion in December 2024 addressing when AI models can be considered anonymous, whether legitimate interest can serve as a legal basis for training, and what happens when a model is developed using unlawfully processed personal data. The French data protection authority, the CNIL, issued guidance in 2025 affirming that training AI models on personal data scraped from public sources can be lawful under the GDPR's legitimate interest basis, but only when specific conditions are met.
These are not the same conversation. Copyright disputes centre on market substitution and economic harm to creators. Privacy disputes centre on individual dignity, autonomy, and the right to control information about oneself. Yet the AI industry routinely conflates them, treating a novelist's published book and a person's scraped social media profile as functionally identical inputs to a training pipeline.
The Scraping Problem
The conflation becomes most visible in the practice of web scraping, where AI companies indiscriminately harvest both published content and personal data from the open internet. Daniel Solove, the Eugene L. and Barbara A. Bernard Professor of Intellectual Property and Technology Law at George Washington University Law School, and Woodrow Hartzog, Professor of Law at Boston University, tackled this collision directly in their 2025 paper “The Great Scrape: The Clash Between Scraping and Privacy,” published in the California Law Review. The paper, which won the Future of Privacy Forum's Privacy Papers for Policy Makers award, argues that scraped personal data provides the foundation for AI tools including facial recognition, deepfakes, and generative AI, even as privacy laws remain largely incongruous with the practice. As Solove and Hartzog have argued in related work, including their 2024 paper “Kafka in the Age of AI and the Futility of Privacy as Control” in the Boston University Law Review, the paradigm of individual control over personal data is fundamentally inadequate in the face of AI systems that process information at a scale and speed that renders individual oversight meaningless.
The Clearview AI saga offers perhaps the starkest illustration of why personal data demands different treatment. The company scraped billions of photographs from publicly accessible websites to build a facial recognition database, then sold access to law enforcement agencies. The photos were “publicly available” in the same way that a novel on a library shelf is publicly available. But the harms are categorically different. When Clearview scrapes your photograph, the resulting database can be used to track your movements, identify you in a crowd, and build a surveillance profile that follows you through physical space. In 2026, at least eight people in the United States were wrongfully arrested due to false positives from facial recognition technology, illustrating that the harms of personal data misuse are not hypothetical but tangible and life-altering.
Data protection authorities across Europe responded accordingly. The Dutch Data Protection Authority fined Clearview 30.5 million euros in 2024 for violating the GDPR by processing biometric data without a legal basis. The French, Greek, Italian, and Dutch authorities have collectively imposed fines of roughly 100 million euros on the company. In the United Kingdom, the Information Commissioner's Office imposed a fine of more than 7.5 million pounds and ordered Clearview to delete UK residents' data; on appeal, the Upper Tribunal in London ruled in October 2025 that the GDPR was applicable and the ICO had proper jurisdiction. The privacy advocacy group noyb filed a criminal complaint against Clearview and its managers in Austria, arguing that the company's executives could face personal criminal liability if they travel to Europe. In the United States, a federal judge in March 2025 approved a class action settlement granting affected individuals a 23 per cent equity stake in Clearview, valued at approximately 51.75 million dollars.
Now compare this with a copyright dispute. When authors Andrea Bartz, Charles Graeber, and Kirk Wallace Johnson sued Anthropic for using their books to train Claude, the harm alleged was economic: their creative labour had been exploited without compensation. Nobody's physical safety was at risk because Anthropic read their novels. The nature of the harm is fundamentally different, and the regulatory response should reflect that difference.
Divergent Courts, Divergent Standards
The copyright side of the AI training debate has produced a revealing split among American federal judges, one that highlights why a single framework for all training data is inadequate. In February 2025, Judge Stephanos Bibas of the Third Circuit, sitting by designation in the District of Delaware, ruled in Thomson Reuters v. ROSS Intelligence that using Westlaw headnotes to train a competing AI legal research tool was not fair use. Judge Bibas found that ROSS had infringed 2,243 headnotes and that its use was not transformative because it created a direct market substitute. This was the first time a U.S. court reached a conclusion on fair use in the AI training context, and the conclusion was a resounding rejection.
Months later, Judge Alsup reached the opposite result in Bartz v. Anthropic, describing AI training as “spectacularly” transformative. In Kadrey v. Meta, the court similarly found that training Meta's Llama models on books was transformative. The Copyright Alliance tracked more than 70 AI-related copyright infringement lawsuits by the end of 2025, with no appellate court yet providing definitive guidance. The Third Circuit granted review of the Thomson Reuters case, making it the first appellate court to take up the question of AI training and fair use.
These cases all involve published, copyrighted works. The legal questions they raise, however important, are fundamentally economic: who profits from creative expression, and under what conditions? Personal data disputes raise questions of a different order entirely. They concern not profit margins but physical safety, psychological autonomy, and the basic right to move through the world without being catalogued by algorithmic systems.
Why “Publicly Available” Does Not Mean “Fair Game”
One of the most dangerous assumptions in the AI training debate is that publicly available information carries no privacy interest. This assumption underpins the behaviour of companies that scrape the open web, treating everything they encounter as raw material for model training. But as Solove has argued across decades of scholarship, the aggregation of otherwise innocuous public data points can create significant privacy violations. Your name on a public LinkedIn profile is one thing. Your name, combined with your job history, your photograph, your social connections, and your posting patterns, is something else entirely.
The legal landscape on scraping remains contested. In the landmark hiQ Labs v. LinkedIn case, the U.S. Ninth Circuit Court of Appeals held in 2022 that scraping publicly available data from LinkedIn did not violate the Computer Fraud and Abuse Act, since publicly accessible websites have no access restrictions to circumvent. The U.S. Supreme Court had vacated an earlier Ninth Circuit ruling and remanded the case for reconsideration following its decision in Van Buren v. United States, but the appellate court reaffirmed its position. Yet this ruling addressed only federal computer fraud law, not privacy. The case ended with a settlement in which hiQ agreed to cease all scraping and destroy all data and algorithms derived from scraped profiles, a result that suggests even “legal” scraping can produce untenable outcomes.
Meta's approach to training its Llama models highlights the tension between published works and personal data with particular clarity. Llama 2 was trained exclusively on publicly available datasets including Common Crawl, Wikipedia, and Project Gutenberg. But for Llama 3 and Llama 4, Meta incorporated proprietary data from Facebook and Instagram. Mark Zuckerberg stated during an earnings call that Meta's corpus of public Facebook and Instagram data exceeds the size of Common Crawl. As of May 2025, Meta began using personal data from European users to train its AI systems, having paused an earlier attempt following discussions with the Irish Data Protection Commission. Starting in December 2025, Meta also began using AI chat interactions for advertising personalisation, adding yet another layer of personal data exploitation to its AI training pipeline.
The privacy advocacy group noyb, led by Max Schrems, sent Meta a cease and desist letter arguing that users who entered their data into Facebook over two decades could not reasonably have expected it to be used for AI training. Noyb also raised a critical point about non-users: people who never created a Facebook account but whose photographs appear in other users' posts are nevertheless swept into Meta's training pipeline. This is personal data being processed without even the pretence of consent, and no amount of copyright law can address it.
The Emerging Legislative Response
Legislators are beginning to recognise that the AI training question requires distinct answers for published works and personal data, though the responses remain fragmented and incomplete.
In the United States, Senators Josh Hawley and Richard Blumenthal introduced the AI Accountability and Personal Data Protection Act in July 2025. The bill is notable precisely because it addresses both categories simultaneously, creating a new federal cause of action that would allow individuals to sue companies that train AI models using either personal data or copyrighted works without clear, affirmative consent. The bill defines “covered data” expansively as information that “identifies, relates to, describes, is capable of being associated with, or can reasonably be linked, directly or indirectly, with a specific individual.” The Authors Guild welcomed the legislation, calling it critical at “a pivotal moment for American authors, artists, and other creators.” It remains with the Senate Judiciary Committee, with no indication of when or whether it will advance.
California's AI Training Data Transparency Act (AB 2013), which took effect on 1 January 2026, takes a different approach. Rather than restricting what data AI companies can use, it requires them to disclose what they have used, including whether copyrighted materials and personal information were included in training datasets. In practice, AI developers have responded with vague, generalised disclosures. Elon Musk's xAI has challenged the statute as unconstitutional, alleging it compels disclosure of trade secrets in violation of the Fifth Amendment's Takings Clause.
In the European Union, the regulatory architecture more explicitly distinguishes between copyright and privacy concerns. The EU AI Act, whose copyright compliance obligations for general-purpose AI model providers took effect on 2 August 2025, requires these providers to implement robust copyright policies and publish “sufficiently detailed” summaries of training content using a mandatory template issued by the European AI Office. The Act operates alongside the GDPR, creating parallel obligations. Under the Copyright in the Digital Single Market Directive, rightsholders can opt out of text and data mining for commercial purposes. Under the GDPR, individuals retain rights over their personal data regardless of whether it has been published. The European Commission's GPAI Code of Practice defines AI training data broadly as all data used for pre-training, fine-tuning, and reinforcement learning, explicitly acknowledging that this encompasses both copyright-protected material and personal data protected by privacy rights.
The German Hanseatic Higher Regional Court provided important guidance in December 2025 in Kneschke v. LAION, confirming that pre-processing steps for AI training fall under text and data mining exceptions (and are thus permitted for lawfully accessed content), but stressing that rightsholders retain control through effective opt-outs and that downstream uses of AI-generated outputs remain subject to copyright scrutiny.
Personal Data Demands Stronger Protections
Here is the core argument for treating personal data differently from published works in the AI training context: the harms are categorically different, the power dynamics are fundamentally asymmetric, and the remedies must reflect both realities.
When an AI company trains on a published novel, the harm is primarily economic. The author loses potential licensing revenue. The work may be reproduced in ways that compete with the original. These are real and significant harms, but they are harms that the copyright system was designed to address. Authors can sue for infringement. Courts can assess fair use. Licensing frameworks can be negotiated. The U.S. Copyright Office's May 2025 report acknowledged as much, concluding that “some uses of copyrighted works for generative AI training will qualify as fair use, and some will not.” The report suggested a spectrum, with noncommercial research training on one end and copying expressive works from pirated sources to generate competing content on the other.
Personal data harms operate on a different register entirely. When an AI company trains on personal data, the potential harms include surveillance, discrimination, identity theft, manipulation, and the erosion of autonomy. These harms are often irreversible. Once personal data has been incorporated into a model's weights, it cannot simply be extracted or deleted. A 2025 study from the University of Tubingen established that large language models qualify as personal data under the GDPR when they memorise training information, triggering data protection obligations throughout the entire AI development lifecycle. The EDPB has acknowledged this problem, noting that whether an AI model is “anonymous” (and thus outside the GDPR's scope) must be assessed on a case-by-case basis, considering whether individuals can be directly or indirectly identified from the model and whether personal data can be extracted through queries.
The power asymmetry is also starkly different. A published author or a major newspaper has legal resources, public visibility, and collective organisations to assert their rights. The New York Times can afford to litigate against OpenAI for years. Individual data subjects, by contrast, are often unaware that their data has been scraped, lack the resources to challenge a trillion-dollar technology company, and face practical barriers to exercising their rights even when those rights exist on paper.
Consider the right to erasure under the GDPR. In principle, individuals can request the deletion of their personal data. In practice, if that data has been used to train a neural network, selective deletion is not technically feasible without retraining the entire model. The emerging field of “machine unlearning” attempts to bridge this gap. Techniques such as gradient subtraction, influence-function updates, and sharded retraining offer approximate methods of removing the influence of specific data points, but each carries significant trade-offs in model performance and reliability. In September 2025, researchers at UC Riverside proposed “source-free unlearning,” a method that operates without the original source data, using a surrogate dataset to guide parameter updates. The results were promising but still fell short of the standard of “complete and permanent erasure” that privacy regulators might demand. As the Cloud Security Alliance noted in an April 2025 assessment, there is no universally accepted method for verifying that machine unlearning has actually succeeded. The gap between legal right and technical reality is a chasm that copyright law, dealing primarily with discrete works that can be identified and removed, does not face to the same degree.
The Consent Question
The question of consent further illuminates why published works and personal data require different treatment. When an author publishes a book, they make a deliberate choice to enter the public sphere. The terms of that entry are governed by copyright law, which grants specific exclusive rights while also permitting certain uses (criticism, commentary, education, and, courts are still deciding, potentially AI training). The consent model for published works is, at least in principle, clear: the act of publication itself establishes a framework of rights and expectations.
Personal data operates under a radically different consent framework. Much personal data is generated not through deliberate publication but through the ordinary activities of daily life: browsing the web, posting on social media, uploading photographs, making purchases. The GDPR requires that consent be “freely given, specific, informed, and unambiguous.” Blanket consent through general terms of service is insufficient; organisations must clearly explain how personal data will be used in AI model training and provide granular consent options.
But the reality is that meaningful consent for AI training is largely fictional. When Facebook users shared photographs and status updates between 2004 and 2024, they were not consenting to their data being used to train large language models that did not yet exist. The temporal gap between data collection and AI training makes informed consent practically impossible. Noyb's Max Schrems made this point forcefully in his cease and desist letter to Meta, arguing that two decades of Facebook usage cannot retroactively be characterised as consent to AI training.
This is why data protection law adopts safeguards that go beyond consent, including purpose limitation (data must be collected for specified purposes and not further processed in incompatible ways), data minimisation (only necessary data should be processed), and the right to object. These principles have no equivalent in copyright law because they address a fundamentally different relationship between individuals and their information.
What a Differentiated Framework Could Look Like
If we accept that published works and personal data should be treated differently in the AI training context, what would a workable framework look like?
For published works, the emerging consensus points towards a licensing-based approach. The Really Simple Licensing (RSL) Standard, announced in September 2025 by a coalition including Reddit, Yahoo, and Medium, allows publishers to embed licensing terms directly into robots.txt files. Collective licensing organisations modelled on music industry bodies like ASCAP and BMI could pool rights from millions of creators and negotiate blanket licences with AI companies. The music industry's own response suggests this is viable: both Warner Music Group and Universal Music Group reached settlements with AI music companies Suno and Udio in 2025, agreeing to licence their catalogues for AI training and co-develop new licensed models for 2026.
For personal data, the framework must be fundamentally different. Licensing is not an adequate model because personal data is not a commodity to be traded but an extension of individual identity. The principles of data protection law, including purpose limitation, data minimisation, transparency, and the right to erasure, must apply with full force. This means that AI companies should be required to establish a clear lawful basis for processing personal data before training begins, not retrospectively. It means that individuals should have meaningful rights to object to the use of their data, with those objections technically enforced rather than merely acknowledged. And it means that data protection authorities must be resourced and empowered to enforce these requirements, as the Garante did with its fine against OpenAI.
The European approach, for all its imperfections, offers a more promising template than the American one. The EU's dual-track regulation, with the AI Act addressing copyright and the GDPR addressing personal data, at least recognises that these are distinct problems requiring distinct solutions. The CNIL's PANAME project, launched in partnership with ANSSI and other institutions, aims to create tools that can assess whether an AI model processes personal data, providing concrete technical solutions rather than relying solely on legal obligations.
The United States, by contrast, lacks a federal data protection law, leaving personal data protections scattered across state-level statutes and sector-specific regulations. The Hawley-Blumenthal bill represents a step towards recognising the dual nature of the problem, but its prospects in Congress remain uncertain. Without comprehensive federal privacy legislation, the American approach will continue to treat personal data as an afterthought to the copyright debate.
The Stakes Are Higher Than You Think
The distinction between published works and personal data in AI training is not merely a legal technicality. It reflects a deeper question about what kind of society we want to build with these technologies.
If we treat published works and personal data identically, we flatten a moral distinction that matters enormously. A novelist who publishes a book has chosen to participate in public discourse and has legal tools to protect their economic interests. A teenager whose Instagram posts are scraped to train an AI model has made no such choice and has virtually no practical recourse. Collapsing these two situations into a single “training data” category serves the interests of AI companies, which benefit from treating all information as raw material, but it does not serve the interests of either creators or individuals.
The U.S. Supreme Court's denial of certiorari in the Thaler case on 2 March 2026, reaffirming that human authorship is a foundational requirement of copyright law, gestures at this distinction. Copyright exists to protect human creative expression. Data protection law exists to protect human dignity and autonomy. Both are under threat from AI systems that consume information indiscriminately, but the threats are different, the harms are different, and the solutions must be different too.
The AI industry has every incentive to resist this differentiation. Separate frameworks for published works and personal data mean separate compliance obligations, separate negotiations, and separate costs. A unified “fair use” or “legitimate interest” argument is simpler and cheaper. But simplicity for the technology industry should not come at the expense of the rights of billions of individuals whose personal data has been swept into training datasets without their knowledge, understanding, or consent.
The courts, regulators, and legislators who will shape AI governance over the coming years must resist the temptation to treat all training data alike. Your novel and your face are not the same thing. They never were. And the law should reflect that reality before it is too late to do anything about it.
References and Sources
Italian Garante per la protezione dei dati personali, Decision on OpenAI/ChatGPT, 20 December 2024. Fine of EUR 15 million for GDPR violations including lack of legal basis for training data processing and transparency failures. Reported by Euronews, The Hacker News, and Lewis Silkin LLP.
Bartz v. Anthropic, U.S. District Court, Northern District of California, June 2025. Judge William Alsup ruled AI training on legally acquired books constitutes fair use. Settlement of USD 1.5 billion. Reported by Copyright Alliance, IPWatchdog, and Authors Guild.
The New York Times v. OpenAI and Microsoft, U.S. District Court, Southern District of New York, filed December 2023. Judge Sidney Stein denied OpenAI's motion to dismiss in March 2025. Court ordered production of 20 million ChatGPT logs in January 2026. Reported by NPR, National Law Review, and Nelson Mullins.
European Data Protection Board (EDPB), Opinion on AI Models and Personal Data, adopted December 2024. Addressed anonymity of AI models, legitimate interest as legal basis, and consequences of unlawful data processing in training.
CNIL (Commission Nationale de l'Informatique et des Libertes), Guidance on AI and GDPR, 2025. Affirmed that legitimate interest can serve as legal basis for training on scraped public data under specific conditions. Published PANAME project for assessing personal data in AI models.
Solove, Daniel J. and Hartzog, Woodrow, “The Great Scrape: The Clash Between Scraping and Privacy,” 113 California Law Review 1521 (2025). Winner of Future of Privacy Forum Privacy Papers for Policy Makers award.
Clearview AI: Dutch Data Protection Authority fine of EUR 30.5 million (May 2024); cumulative European fines of approximately EUR 100 million from French, Greek, Italian, and Dutch authorities. UK ICO fine of GBP 7.5 million; Upper Tribunal affirmed jurisdiction October 2025. U.S. class action settlement valued at USD 51.75 million approved March 2025. Reported by Fortune Europe, Library of Congress, National Law Review, and BBC.
hiQ Labs, Inc. v. LinkedIn Corp., U.S. Court of Appeals for the Ninth Circuit, No. 17-16783 (2022). Held that scraping publicly available data does not violate the Computer Fraud and Abuse Act. U.S. Supreme Court vacated and remanded in light of Van Buren v. United States (2021). Case settled December 2022 with permanent injunction against hiQ.
Meta Platforms, use of Facebook and Instagram data for Llama AI training. European deployment of personal data for AI training commenced May 2025 following discussions with Irish Data Protection Commission. Noyb cease and desist letter challenging retroactive consent. Reported by Euronews, MIT Technology Review, and Goodwin Law.
AI Accountability and Personal Data Protection Act, S.2367, 119th Congress (2025-2026). Introduced by Senators Josh Hawley (R-MO) and Richard Blumenthal (D-CT) on 21 July 2025. Creates federal cause of action for use of personal data or copyrighted works in AI training without affirmative consent. Reported by Axios, IPWatchdog, and Authors Guild.
California AI Training Data Transparency Act (AB 2013), effective 1 January 2026. Requires disclosure of training data sources including copyrighted materials and personal information. Challenged by xAI as unconstitutional. Reported by Davis+Gilbert LLP and Goodwin Law.
EU AI Act, copyright compliance obligations for general-purpose AI model providers, effective 2 August 2025. European Commission mandatory template for training data disclosure published July 2025. GPAI Code of Practice defines training data broadly to include both copyright-protected and personal data. Reported by IAPP, Clifford Chance, and WilmerHale.
Kneschke v. LAION, German Hanseatic Higher Regional Court, December 2025. First appellate-level guidance on copyright exceptions for text and data mining in AI training context. Reported by Norton Rose Fulbright.
U.S. Copyright Office, Report on AI Training and Copyright, May 2025. Concluded that fair use outcomes will vary by case. Reported by McDermott Will & Emery and Library of Congress Congressional Research Service.
Thomson Reuters Enterprise Centre GmbH v. ROSS Intelligence Inc., U.S. District Court, District of Delaware, February 2025. Judge Stephanos Bibas granted partial summary judgment to Thomson Reuters, rejecting fair use defence for AI training on Westlaw headnotes. First U.S. court ruling on fair use in AI training context. Appeal granted by Third Circuit. Reported by Authors Alliance, Reed Smith, and Venable LLP.
Thaler v. Perlmutter, U.S. Supreme Court denied certiorari 2 March 2026, reaffirming human authorship requirement for copyright protection.
Really Simple Licensing (RSL) Standard, announced September 2025 by coalition including Reddit, Yahoo, and Medium. Framework for embedding licensing terms in robots.txt files.
Warner Music Group settlement with Suno, and Universal Music Group settlement with Udio, both 2025. AI music companies agreed to licence catalogues for training. Reported by Digital Music News and Copyright Alliance.
Solove, Daniel J., “Artificial Intelligence and Privacy,” Florida Law Review (2025). Analysis of how AI remixes longstanding privacy problems.
Hartzog, Woodrow and Solove, Daniel J., “Kafka in the Age of AI and the Futility of Privacy as Control,” 104 Boston University Law Review 1021 (2024).
University of Tubingen, 2025 study establishing that large language models qualify as personal data under GDPR when they memorise training information. Reported by PPC.land.
UC Riverside, “Source-Free Unlearning” method for machine unlearning without original training data, September 2025.
Cloud Security Alliance, “The Right to Be Forgotten, But Can AI Forget?”, April 2025. Assessment of machine unlearning challenges and verification difficulties.
Noyb, Criminal complaint against Clearview AI filed with Austrian public prosecutors, 2025. Reported by noyb.eu.
EDPB Guidelines on Data Transfers and SPE Training Material on AI and Data Protection, published 2025.

Tim Green UK-based Systems Theorist & Independent Technology Writer
Tim explores the intersections of artificial intelligence, decentralised cognition, and posthuman ethics. His work, published at smarterarticles.co.uk, challenges dominant narratives of technological progress while proposing interdisciplinary frameworks for collective intelligence and digital stewardship.
His writing has been featured on Ground News and shared by independent researchers across both academic and technological communities.
ORCID: 0009-0002-0156-9795 Email: tim@smarterarticles.co.uk