Phantom Citations and Paper Mills: How AI Fraud Reaches Patients

Somewhere in a hospital pharmacy in Birmingham, a clinical pharmacist is reading a draft protocol for an off-label oncology treatment. The relevant guideline cites a meta-analysis. The meta-analysis pools results from twenty-three primary studies. Of those twenty-three, four sit inside the suspect cluster recently flagged by a machine-learning screen out of the Queensland University of Technology. Two more contain references that, when checked by a graduate student during a long weekend, point to journal articles that do not exist. The pharmacist closes the laptop and stares at the wall for a minute. The treatment is already being prescribed across the NHS. The question she does not know how to ask, because no part of her training has equipped her to ask it, is whether the underlying evidence is actually evidence at all.
This is not a science-fiction conceit. It is the practical condition of evidence-based medicine in mid-2026.
In the past nine months, three pieces of work have, taken together, produced something close to an emergency for anyone who relies on the scientific literature to make consequential decisions. In January, a team led by Adrian Barnett at QUT published a study in The BMJ that ran 2.6 million cancer papers through a machine-learning screen and concluded that 9.87 per cent of them showed textual fingerprints consistent with paper mill output. In April, Nature, working with the screening company Grounded AI, surfaced an analysis suggesting that tens of thousands of publications from 2025 might contain references generated, in part or in whole, by large language models hallucinating citations into being. In May, a Lancet letter from a Columbia University group led by Maxim Topaz, drawing on an audit of nearly 2.5 million biomedical papers and 97 million references, found that fabricated citations have grown twelve-fold in two years. By the first seven weeks of 2026, the rate had reached one in 277 papers. In 2023, it was one in 2,828.
A Northwestern University team had already, in work published in 2025 and amplified again in March 2026, used the word that the field had been reluctant to use in print. Industrialised. Scientific fraud, the Northwestern researchers argued, is no longer the work of unhinged solo operators forging Western blots in a basement. It is a supply chain. There are brokers, there are compromised editors, there are pipelines that harvest public data, run it through standardised analyses, dress it in AI-written prose, generate publication-ready figures, and sell the finished article with the authorship slots already vacant and waiting. The fraud, in other words, is doubling roughly every eighteen months. Legitimate science is doubling every fifteen years.
These numbers describe a foundation that has begun to rot, quietly, beneath the floorboards of a building whose occupants assume it is sound.
The shadow industry that science forgot to notice
Paper mills are not new. They predate the current panic by at least a decade. The integrity sleuth Elisabeth Bik, formerly of Stanford and now perhaps the best-known image-forensics specialist in the world, has been documenting them since the mid-2010s, when a peculiar consistency in the look of certain Chinese-authored cancer biology papers led her to suspect a small number of operations were producing manuscripts at industrial throughput. Bik, working largely alone, eventually flagged thousands of papers, hundreds of which have since been retracted. The Center for Scientific Integrity, founded by Ivan Oransky and his Retraction Watch co-founder Adam Marcus, has tracked the retraction surge: about one in 5,000 papers retracted in the early 2000s, roughly one in 500 today. The shape of the curve has been clear for years to the people who looked. The catastrophe was that almost no one looked.
The pre-AI economics of a paper mill were already attractive enough to support a multi-million-dollar trade. A finished, journal-ready manuscript with guaranteed authorship in a low-impact journal could be sold for the equivalent of a few thousand pounds. Authors, predominantly but not exclusively in jurisdictions where promotion and bonus structures are pinned to publication count, could be moved into pole position on a paper they had never seen. The mill kept costs down by recycling boilerplate, splicing data, manipulating gel images, and exploiting the willingness of overworked or compromised editors to wave through manuscripts that ticked the right boxes. The product was bad, but the supply chain was robust.
Large language models did not invent this trade. They have changed it the way containerisation changed shipping. The marginal cost of producing a plausible-looking abstract has collapsed to roughly the cost of an API call. The marginal cost of producing a plausible-looking discussion section, complete with appropriately hedged claims and ostensibly relevant citations, is similar. The introduction can be generated in seconds. The figures can be drawn by a generative model trained on real Western blots. The bottleneck, for years, was the ability to write fluent English; the language model removed that bottleneck overnight. What used to require a small writers' room now requires an account and a credit card.
Bernhard Sabel, a neuroscientist at the Otto von Guericke University in Magdeburg who has spent much of the past decade attempting to quantify the paper mill problem, has argued that the numbers are far worse than the retraction record suggests. His estimates, published in pre-print form and discussed in the popular press through 2024 and 2025, suggested that perhaps a quarter of all biomedical papers in some sub-fields are fake. The QUT result of 9.87 per cent across cancer literature is, by Sabel's argument, conservative. It is also possibly the most rigorous figure we have for any sub-field at present.
The Frankenstein citation
The most disorientating element of the new fraud, the one that distinguishes the AI era from the pre-AI era, is not the speed or the scale. It is the citation.
Citations have always been the connective tissue of scholarship. A claim is made; an earlier paper is invoked; a reader who doubts the claim can follow the trail back to its source. The convention is so old and so robust that it has stopped being remarked upon. Reviewers do not, as a rule, click every reference in a manuscript they are evaluating. They could not, even if they wanted to. The list, in a typical biomedical paper, runs to forty or eighty or, in a review article, several hundred entries. The expectation that the references are real is the expectation that the sun will rise.
Large language models break that expectation in a specific and underappreciated way. They do not, when asked to provide supporting references, distinguish between a citation that exists and a citation that ought to exist. They generate strings of text that resemble citations. The string contains an author who has plausibly worked in the relevant area, a journal that publishes in that area, a year that fits the timeline, a volume and page number that look right. Sometimes one or two of the components are real. Sometimes none of them are. The reference looks fine. It is not fine.
These are what the integrity community has begun to call Frankenstein citations. Stitched together from genuine fragments, they pass casual inspection. A real author. A real journal. A title that almost certainly does not correspond to a real paper. The Nature analysis in April, conducted with Grounded AI, suggested that tens of thousands of publications from 2025 carry these creatures inside them. The Topaz audit at Columbia, published the following month in The Lancet, put a hard number on it for biomedical literature alone: 4,046 fake citations across 2,810 research papers in the corpus the team examined, with the inflection point in fabrication rate coinciding almost exactly with the public release of the first widely usable consumer language models in late 2022 and early 2023.
There is a feature of the Topaz audit that bears restating. The fake citations were found across the literature, not concentrated in obscure or predatory venues. Some of the affected journals are highly ranked. Some of the affected articles have themselves been cited by other articles, which means the fictional references are propagating. A nonexistent paper, invoked in support of a real claim, becomes part of the apparent evidence base for that claim. A subsequent author, reading the paper that cites the nonexistent paper, may invoke the same reference. The fiction acquires the patina of established fact.
What peer review was, and what it cannot do
The defence that the scientific establishment has historically offered against this kind of contamination is peer review. It is a defence with a particular history and particular limits, and 2026 has been the year in which the limits became impossible to ignore.
Peer review, in the form most working scientists experience it, is roughly a post-war phenomenon. Before about 1950, journal editors made publication decisions largely on their own authority, sometimes consulting trusted colleagues. The expansion of scientific publishing in the second half of the twentieth century, coupled with the increasing specialisation of fields, made editorial omniscience impossible, and the formal practice of sending manuscripts to external reviewers became standard. By the 1980s, peer review had taken on the cultural weight of a near-sacred process. The phrase “peer-reviewed” became, in lay discussion, a synonym for “true”.
It was never that. Reviewers, even in the best-functioning systems, are unpaid, hurried, and selected for subject-matter expertise rather than for forensic skill. They are not auditors. They do not, as a rule, request raw data. They do not run the analyses themselves. They do not telephone the cited authors to confirm that the cited paper says what it is claimed to say. The fundamental assumption of peer review, an assumption baked into every textbook description of how science works, is that the authors are operating in good faith. When that assumption holds, peer review functions reasonably well as a check on competence and clarity. When that assumption fails, peer review functions essentially as a stamping mechanism for plausible-looking fraud.
The figures coming out of the machine-learning conferences in 2026 illustrate the secondary problem, which is that even the reviewers may now be AI. An analysis by Pangram Labs of roughly 76,000 reviews submitted to the International Conference on Learning Representations found that about 21 per cent of them showed signs of being fully generated by a language model. A survey of 1,600 academics, reported through the spring, suggested that more than half had used AI tools at some point in the review process. Some journals have introduced disclosure requirements; few have meaningful means of enforcing them. A reviewer who runs a manuscript through a language model and submits the model's output as their own assessment faces, at present, no consequence unless caught, and being caught is rare.
The result is a literature in which AI-generated papers may be evaluated by AI-generated reviews and accepted by editors whose workload makes serious adjudication impossible. The integrity sleuth Nick Wise, an engineer at the University of Cambridge who has spent several years tracking the buying and selling of authorships on Telegram channels, put it crisply in a 2025 interview: the system was already strained, and the language models have flooded it.
A pharmacist in Birmingham, again
Return to the hospital in Birmingham. Imagine that the off-label oncology protocol involves a repurposed kinase inhibitor, originally licensed for a different indication, now being trialled informally for a small population of patients with a particular molecular subtype. The supporting evidence is a published meta-analysis. The meta-analysis pools twenty-three studies. The molecular biology underlying the rationale is plausible. The dosing schedule is reasonable. The protocol has been reviewed by a hospital committee. The first patient is enrolled.
Now consider how this patient might be harmed. The relevant subset of the supporting studies, the ones produced by paper mills using AI to generate plausible-looking results from synthetic or recycled data, may have inflated the apparent response rate of the treatment. The Frankenstein citations within the meta-analysis itself may have given the impression of greater literature support than actually exists. The reviewers of the meta-analysis, working at speed, would not have caught either contamination. The journal editors would not have caught it. The hospital committee, drawing on the published evidence, would have no mechanism to catch it. The pharmacist who notices something amiss does so only because she has been reading about the QUT screen in the trade press, and she happens to know how to use a citation-verification service. Most pharmacists do not have that combination of curiosity and free time.
If the patient suffers a serious adverse event traceable to the treatment, the chain of responsibility becomes a thicket. Did the clinician follow the standard of care? Yes; the treatment was supported by published evidence. Did the publisher exercise reasonable diligence? The publisher will argue, with some justification, that no peer-reviewed system can be expected to detect every fraudulent submission. Did the AI provider have a duty? The AI provider will note that their terms of service prohibit using the model to generate fraudulent academic content. Did the regulator, whether the Medicines and Healthcare products Regulatory Agency in the United Kingdom or its equivalent elsewhere, have a duty to vet the evidence base? Regulators are, in general, charged with evaluating evidence submitted to them in support of a marketing authorisation. They do not, in the ordinary course, audit the entire downstream literature for the indications on which clinicians may rely.
The liability vacuum is the precise structural feature that makes the new fraud so dangerous. Every party in the chain can point, with some justification, to another. The result is that the patient bears the risk.
How the regulators are thinking about this
Through the spring of 2026, the major medicines regulators have been notably quiet on the question of AI-fabricated research, at least in public. Officials at the MHRA, the European Medicines Agency, and the United States Food and Drug Administration have all, in panel discussions and conference remarks, acknowledged that the integrity of the underlying scientific literature is a matter of concern. None of them have, as of the date this article is being written, articulated a clear policy on how to handle indications, guidelines, or off-label uses whose evidence base may be partly contaminated by paper mill output.
There is a reason for the caution. Regulators operate on a model of dossier evaluation. A pharmaceutical company applying for marketing authorisation submits a defined body of evidence, generally including raw clinical trial data, and that body of evidence is scrutinised in considerable depth by the regulatory agency. The fabricated literature problem sits largely outside that perimeter. It affects the academic biomedical literature, where clinicians look for evidence to guide off-label prescribing, where guideline committees synthesise evidence for clinical practice statements, and where meta-analyses are constructed. The MHRA does not, in any meaningful sense, audit the academic literature on which clinical guidelines are built.
The European Medicines Agency has, since 2024, been investing in tooling that can flag suspicious submissions, and has been working with publishers through bodies such as the Committee on Publication Ethics. The FDA's Office of Scientific Investigations conducts inspections of clinical trial sites and audits of pivotal trial data. None of this currently extends to the downstream contamination problem, in which a regulator might find itself, two years from now, in the position of having approved a drug or indication partly on the basis of literature that has subsequently been mass-retracted.
The slow pace of correction compounds the regulatory problem. The Cochrane Collaboration, the gold-standard producer of systematic reviews, has been wrestling with the contamination of its own outputs. A 2024 cross-sectional study of roughly 200,000 systematic reviews found that 0.15 per cent of them incorporated retracted paper mill articles into their evidence synthesis, with oncology the most affected field. The headline figure sounds small. It is not. A 0.15 per cent contamination rate, applied to a literature on which hundreds of millions of clinical decisions are based, is several hundred reviews. More importantly, the time lag between a paper's retraction and its disappearance from the citing literature is long. The same study found 124 citations occurring after retraction, including 13 that occurred more than 500 days after the retraction date. Once contamination has entered the synthesis layer, it takes years to wash out, and in many cases it never washes out completely.
What detection looks like, and what it cannot do
The most encouraging element of the present moment is that the integrity community has, in a way that would have seemed implausible five years ago, professionalised. Adrian Barnett's group at QUT trained a BERT-class language model on the textual fingerprints of papers known to be retracted for paper mill activity. The model achieved 91 per cent internal accuracy and 93 per cent external accuracy, with specificity above 96 per cent. That is genuinely useful performance. It is the basis on which the 9.87 per cent figure for cancer literature was generated. There are now multiple comparable initiatives at other universities and at private firms, including Grounded AI, the company whose collaboration with Nature produced the April 2026 hallucinated-citation analysis. Image-forensics tools, used by Bik and others to identify duplicated and manipulated figures, have improved. Citation-verification services that simply check whether a reference resolves to a real publication have begun to appear in commercial form.
The limits of all of these tools are the same. They are good at catching the previous generation of fraud. They are less good at catching the next generation. The paper mills know what the detection tools look for. As the detectors improve, the mills adjust. The integrity researcher Anna Abalkina, based at the Free University of Berlin, has documented through 2024 and 2025 how mill operations on Russian and Chinese Telegram channels have responded to public discussion of detection methods, in some cases within weeks. This is the Red Queen problem that the broader AI safety field is also confronting: every more sophisticated detector elicits a more sophisticated evasion, and the two co-evolve indefinitely. Detectors are a time-buying tool, not a permanent fix.
There is a deeper theoretical limit that is worth naming. A 2023 result, since refined by other groups, established that as the text distribution of a sufficiently capable language model approaches that of human writing, no statistical detector can do better than chance. The implication is that text-based detection of AI-generated content cannot be a long-term solution. The signal will, in the limit, disappear. Detection has to be structural. It has to attach to data, to authorship verification, to institutional auditing, to the integrity of the supply chain itself.
The sleuthing communities, working largely as volunteers on platforms such as PubPeer, have continued to do extraordinary work. Bik, Wise, and a loose international constellation of others have flagged thousands of suspect papers in the past two years. The publishers, prodded by sustained reporting from Retraction Watch and others, have begun to retract at higher rates: the Springer Nature journal Neurosurgical Review made headlines in early 2025 by retracting scores of AI-generated commentaries and letters at once. Retractions hit record highs in the preceding years — 2023 alone produced more than fourteen thousand notices, swollen by mass retractions of compromised special issues — and the Retraction Watch database now holds well over fifty thousand entries. But retractions are still a fraction of the contamination that the screening studies suggest exists. The system is running well behind the fraud.
The contamination of the synthesis layer
The most consequential element of the AI-fabrication crisis, for clinical practice, is not the existence of fake papers. It is what happens when those papers feed upwards into the synthesis layer of biomedical evidence.
Evidence-based medicine, as practised since roughly the early 1990s, depends on a hierarchy. At the base, individual primary studies. Above them, systematic reviews and meta-analyses, which pool the primary studies and attempt to extract a more reliable signal than any single study can offer. Above those, clinical guidelines, which translate the synthesised evidence into recommendations for practice. The structure is recursive: each layer depends on the integrity of the layer below.
A paper mill product introduced into the primary literature does not stay there. If it is plausible enough to pass review, it is plausible enough to be picked up by a systematic reviewer running a database search. If it is plausible enough to be included in the systematic review, it contributes to the pooled estimate that the review reports. If the review is used to inform a guideline, the contamination has worked its way to the level at which clinical practice changes. The pharmacist in Birmingham is reading a guideline. The guideline is summarising a review. The review is pooling papers. Some of the papers are not real, in any meaningful sense, but the chain of inheritance does not transmit that information upwards. By the time the guideline is in front of the pharmacist, the original fabrication has been laundered into apparent consensus.
This is the property that makes the present situation different in kind, and not only in degree, from the previous era of scientific fraud. The previous era's frauds were episodic. Andrew Wakefield's MMR paper, the Schon affair in physics, the Hwang stem-cell case, the Stapel social-psychology fraud: each was the work of a small number of individuals, each was eventually exposed, each occupied the literature for some years and then was excised, with the connective tissue around it eventually repaired. The current situation is structural. It is not one fraudster producing twenty fraudulent papers; it is a global supply chain producing tens of thousands of fraudulent papers a year, embedded across every sub-field, and propagating into the synthesis layer faster than retraction can keep up.
A clinician applying evidence-based medicine in good faith, in 2026, is not necessarily applying the evidence base they think they are applying.
What it would actually take to fix this
The honest answer is that no one knows, and the proposals being floated are uneven in their ambition and their likely effectiveness.
The most modest proposals concentrate on submission-time screening. Every major publisher could, in principle, run every submitted manuscript through a battery of detectors, including text-based AI screens, image-forensics tools, statistical anomaly detectors, and citation-verification services. Some publishers are already doing some of this. The costs are real but not prohibitive. The likely impact is incremental. The detectors will catch the easy cases. They will miss the sophisticated mills.
A more ambitious set of proposals concerns the structure of authorship and the integrity of the data supply chain. If every paper had to be accompanied by raw data, deposited in a public repository at the moment of submission, the cost of paper mill output would rise sharply, because the synthetic data would need to withstand scrutiny in a way that synthetic prose does not. If every author had to be verified through an institutional credential that was independently checkable, the trade in authorship slots would become more difficult. If the entire chain from data collection to publication were recorded in a verifiable provenance log, post-hoc auditing would become feasible in a way that it presently is not. These changes would require sustained co-operation across publishers, institutions, funders, and regulators. They would be expensive. They would not, on their own, solve the problem, but they would push the marginal cost of fraud upward in a useful way.
The most radical proposals contemplate a wholesale rebuilding of the publication system. They take the view, articulated in various forms by reformers including Ivan Oransky, that the present system, in which publication count is a proxy for scientific value and journals are private gatekeepers, is structurally incapable of withstanding the pressure that AI has now brought to bear. In the limit, the argument goes, the academic credentialling system needs to decouple from the journal system altogether. Researchers should be evaluated on the strength and reproducibility of specific contributions, audited by their institutions, rather than on the number of articles they have placed in journals. The journals, freed from their gatekeeping function, could become curation layers atop a more transparent underlying infrastructure of pre-prints and data deposits.
None of these proposals is close to implementation. The institutional inertia is enormous. The incentive structures that produce the fraud are, in many of the jurisdictions where the mills flourish, baked into national research evaluation systems. The publishers, whose revenue depends on the existing volume of submissions, have an ambivalent relationship to the reforms most likely to slow that volume. The funders, who could in principle force change through grant conditions, have moved slowly. The regulators, as discussed, are mostly looking at the problem from the wrong end.
In the meantime, the foundation continues to subside.
Trust, and what it costs to lose it
The scientific record is, among other things, a trust infrastructure. It is the means by which a clinician in Birmingham, a regulator in Canary Wharf, a guideline committee in Geneva, and a patient anywhere in the world can act on knowledge that none of them personally produced. The functioning of the infrastructure depends on a chain of assumptions, each of which is now, to some degree, under question. The assumption that the authors are real. The assumption that the data are real. The assumption that the citations resolve to real papers. The assumption that the reviewers read the manuscript. The assumption that the editor adjudicated in good faith. The assumption that the retraction system catches the fraud quickly enough to prevent downstream contamination.
It is possible, and important, to overstate this. The overwhelming majority of biomedical research is still produced by competent, conscientious researchers operating in good faith. The QUT figure of 9.87 per cent is alarming, but it implies that 90 per cent of cancer literature is still, in the relevant sense, real. The Lancet figure of one in 277 papers with fabricated citations means that 276 in 277 do not have them. The system is not collapsing. It is being eroded.
But erosion is not a comforting metaphor for those who have to act on the literature in real time. The Birmingham pharmacist, looking at the guideline, does not have the option of waiting two years for the retraction process to catch up. The patient does not have the option of consulting only the validated subset of the evidence base. The regulator does not have the option of pausing the approval process while the literature is audited from end to end. The decisions have to be made now, on the literature as it stands, with whatever degree of contamination it presently carries.
What the integrity sleuths and the screening researchers and the data scientists have given us, in the past two years, is for the first time some measure of the contamination. The number is uncomfortable. It is also probably an underestimate. Sabel's higher figures may turn out to be closer to the truth in some sub-fields. The Topaz audit is restricted to citations that can be checked algorithmically, and citations are only one of the artefacts the language models can fabricate. The image-forensics work suggests that figure manipulation is, if anything, more prevalent than text fabrication, and harder to detect at scale. The honest summary, in the middle of 2026, is that we do not know how bad it is, and the directional indicator is towards worse.
There is a way of telling this story in which the villain is the language model. That is too easy. The language model is a tool. The fraud is a response to incentives that long predated the model. The Chinese promotion structures that rewarded paper count without regard to paper quality, the global publish-or-perish culture, the prestige economy of impact factors, the cost structures of academic publishing, the under-resourcing of post-publication audit: all of these existed before the first transformer paper was written. The model simply lowered the cost of exploiting the gaps. If the gaps are not closed, the next generation of models will lower the cost further.
There is also a way of telling this story in which the heroes are the sleuths. That is closer to the truth, but it understates the scale of what is required. Bik, Oransky, Wise, Sabel, Abalkina, Barnett, Topaz, and the broader community working alongside them have done extraordinary work, mostly unpaid, often under threat of legal action from publishers and authors who would prefer not to be scrutinised. They have made the present picture visible. They cannot, by themselves, repair it. The repair requires institutions to act with a co-ordination and a seriousness they have not yet shown.
The pharmacist in Birmingham is fictional in the sense that no individual real person occupies the precise scenario described at the top of this article. The structural situation she occupies is not fictional. Across the United Kingdom, across Europe, across North America, across every system that has historically relied on the biomedical literature as a foundation for clinical decisions, that foundation is being silently rearranged. The studies that doctors, regulators, and patients rely on may no longer mean what they appear to mean. Some of them mean very nearly nothing. We have learned, in the past nine months, something close to the scale of the problem. We have not yet learned what to do about it.
What happens to the trustworthiness of the evidence that medical practice, public health guidance, and drug regulation depend on, if peer review cannot reliably distinguish AI-fabricated research from genuine findings? It declines. It is declining now. The question is whether the institutions that depend on it will move fast enough to arrest the decline before it forces, somewhere, the kind of patient-level catastrophe that finally compels action. The answer to that question is not yet known. The clock is running.
References and Sources
- Barnett, A. G. et al. “Machine learning based screening of potential paper mill publications in cancer research: methodological and cross sectional study.” The BMJ, January 2026. https://pmc.ncbi.nlm.nih.gov/articles/PMC12853418/
- Queensland University of Technology. “New tool exposes scale of fake research flooding cancer science.” QUT News, January 2026. https://www.qut.edu.au/news?id=203173
- Nature. “Hallucinated citations are polluting the scientific literature. What can be done?” Nature, April 2026. https://www.nature.com/articles/d41586-026-00969-z
- Topaz, M. et al. “Fabricated citations: an audit across 2.5 million biomedical papers.” The Lancet, May 2026. https://www.thelancet.com/journals/lancet/article/PIIS0140-6736(26)00603-3/fulltext
- STAT News. “Fraudulent citations, blamed on AI hallucinations, are becoming more common in research papers.” STAT, 7 May 2026. https://www.statnews.com/2026/05/07/lancet-study-finds-steep-rise-fraudulent-citations-academic-papers/
- Retraction Watch. “One in 277 PubMed-indexed papers in 2026 shows fabricated references, says analysis.” Retraction Watch, 7 May 2026. https://retractionwatch.com/2026/05/07/one-in-277-pubmed-indexed-papers-in-2026-shows-fabricated-references-says-analysis/
- Columbia School of Nursing. “Nearly 3,000 peer-reviewed medical papers have fake citations, a Columbia Nursing AI-assisted audit finds.” Columbia University, 2026. https://www.nursing.columbia.edu/news/nearly-3-000-peer-reviewed-medical-papers-have-fake-citations-columbia-nursing-ai-assisted-audit-finds
- CBS News. “AI is fabricating citations in biomedical studies, researchers find.” CBS News, 2026. https://www.cbsnews.com/news/ai-hallucinate-citations-medial-research/
- ScienceDaily. “Scientists warn fake research is spreading faster than real science.” ScienceDaily, 6 March 2026. https://www.sciencedaily.com/releases/2026/03/260306224235.htm
- EurekAlert. “Organized scientific fraud is growing at an alarming rate.” EurekAlert, August 2025. https://www.eurekalert.org/news-releases/1093143
- The Debrief. “Scientific Fraud Exposed: The Multi-Million-Dollar 'Shadow Industry' Creating Junk Science to Propel Academic Careers.” The Debrief, 2025. https://thedebrief.org/scientific-fraud-exposed-the-multi-million-dollar-shadow-industry-creating-junk-science-to-propel-academic-careers/
- Pebblous AI. “When AI Reviews AI, 21% of ICLR 2026's 76,139 Peer Reviews Were AI-Generated.” Pebblous AI Blog, 2026. https://blog.pebblous.ai/report/iclr-2026-ai-peer-review-crisis/en/
- arXiv. “Detecting AI-Generated Content in Academic Peer Reviews.” arXiv preprint, February 2026. https://arxiv.org/html/2602.00319v2
- Retraction Watch. “As Springer Nature journal clears AI papers, one university's retractions rise drastically.” Retraction Watch, 10 February 2025. https://retractionwatch.com/2025/02/10/as-springer-nature-journal-clears-ai-papers-one-universitys-retractions-rise-drastically/
- FAPESP. “Elisabeth Bik: On the trail of scientific fraud.” Revista Pesquisa Fapesp. https://revistapesquisa.fapesp.br/en/elisabeth-bik-on-the-trail-of-scientific-fraud/
- STAT News. “Elisabeth Bik tackles the widespread issue of research misconduct.” STAT, February 2024. https://www.statnews.com/2024/02/28/elisabeth-bik-scientific-integrity-research-misconduct/
- Conexiant. “Is Science Retracting Enough Papers?” Conexiant. https://conexiant.com/internal-medicine/articles/scientific-retractions-surge-tenfold-yet-represent-fraction-of-flawed-research
- PMC. “Citation Contamination by Paper Mill Articles in Systematic Reviews of the Life Sciences.” PMC12163679. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12163679/
- Marketplace. “Academic journals have a fraud problem.” Marketplace, 28 October 2025. https://www.marketplace.org/story/2025/10/28/academic-journals-have-a-fraud-problem
- Fortune. “AI hallucinations are slipping past experts into papers and books to enter the permanent record.” Fortune, 24 May 2026. https://fortune.com/2026/05/24/ai-hallucinations-scientific-research-authors-medical-journal-treatment/
- Nature. “AI intensifies fight against 'paper mills' that churn out fake research.” Nature, 2023. https://www.nature.com/articles/d41586-023-01780-w
- bioRxiv. “Revealing the Paper Mill Iceberg: AI-Based Screening of Cancer Research Publications.” bioRxiv preprint, August 2025. https://www.biorxiv.org/content/10.1101/2025.08.29.673016v1
- Retraction Watch. “Research integrity conference hit with AI-generated abstracts.” Retraction Watch, 18 November 2025. https://retractionwatch.com/2025/11/18/research-integrity-conference-hit-with-ai-generated-abstracts/
- Retraction Watch. “Springer Nature flags paper with fabricated reference to article (not) written by our cofounder.” Retraction Watch, 21 November 2025. https://retractionwatch.com/2025/11/21/springer-nature-flags-paper-with-fabricated-reference-to-article-not-written-by-our-cofounder/
- Frontiers in Research Metrics and Analytics. “Artificial intelligence in the retraction spotlight: trends, causes and consequences of withdrawn AI literature.” Frontiers, 2025. https://www.frontiersin.org/journals/research-metrics-and-analytics/articles/10.3389/frma.2025.1737168/full

Tim Green UK-based Systems Theorist & Independent Technology Writer
Tim explores the intersections of artificial intelligence, decentralised cognition, and posthuman ethics. His work, published at smarterarticles.co.uk, challenges dominant narratives of technological progress while proposing interdisciplinary frameworks for collective intelligence and digital stewardship.
His writing has been featured on Ground News and shared by independent researchers across both academic and technological communities.
ORCID: 0009-0002-0156-9795 Email: tim@smarterarticles.co.uk
Listen to the free weekly SmarterArticles Podcast