Post by Rebecca Sear / Redsky

Rebecca Sear • Feed

“The study authors asked GPT 4o-mini to evaluate the quality of 217 papers. The tool didn’t mention in any of the reports that the papers being analyzed had been retracted or had validity issues. In 190 cases, GPT described the papers as world leading, internationally excellent, or close to that”

aug 25, 2025, 8:56 am • 379 198

Replies

But why? "It Does Not Do What It Cannot Do: Asking LLMs to Evaluate Scientific Papers Produces Erratic Results." Abstract: Because none of us can be bothered to look up what LLMs actually do, we asked ChatGPT to evaluate scientific papers like there's a little guy in there, hello little guy, how are

aug 25, 2025, 6:07 pm • 2 2 • view

AI presents "The Emperor's new clothes".

aug 25, 2025, 11:33 am • 4 0 • view

Attempting to replicate this... I plucked one random low-profile retracted paper (pmc.ncbi.nlm.nih.gov/articles/PMC...) and asked GPT-5/Gemini/Claude "What do you think of this paper," with the title + abstract pasted. No model mentioned a retraction, but all said the paper has low evidential value

aug 25, 2025, 12:28 pm • 0 0 • view

I don't doubt the author's findings using their design, but it seems like a leap to claim that their findings based on GPT-4o-mini apply to contemporary state-of-the-art or near-state-of-the-art LLMs

aug 25, 2025, 12:38 pm • 0 0 • view

Can't it be programmed (or whatever) to look for weak methods ? Mind you, what would then happen to the 1000s of BioBank papers with response rate of 85%?

aug 25, 2025, 9:02 am • 1 0 • view

I meant NON response rate of 85%

aug 25, 2025, 9:20 am • 1 0 • view

ChatGPT merely mimics text found on the Internet. It has no concept of quality, truth, or the real world. So of course it's going to describe any paper, including non-existent ones, as excellent, because it's seen those comments that look like that near text that looks like scientific paper titles.

aug 25, 2025, 10:50 am • 1 0 • view

Crikey. Is that how it works? It makes my bugbear of meta-analyses look good.

aug 25, 2025, 11:13 am • 0 0 • view

Essentially, yes. Like other Large Language Models, it goes very deep in the analysis of language in its training set, but it does not have people flagging truth or high quality inputs, beyond what's written in the training set and additional searches run in response to inputs.

aug 25, 2025, 11:19 am • 0 0 • view

This is why LLMs can't do math - they just mimic the words used to describe math, without any concept of number.

aug 25, 2025, 11:20 am • 0 0 • view

😅😅

aug 25, 2025, 11:24 am • 0 0 • view

Seems like the prompt asked it to evaluate the quality of papers, which surely ought to have brought up retractions and expressions of concern.

aug 25, 2025, 9:13 am • 5 0 • view

For sure, that is a lot more extreme than what I was imaging (mere bias & unrepresentativeness, pecadillos in comparison)

aug 25, 2025, 9:19 am • 1 0 • view

Ah. This is all YOUR fault. (I mean, don't for a moment think that AI might be Gas Lighting you). It's YOUR fault, because AI is only as good as YOUR prompt. Did your* prompt suggest looking for retractions and concerns. See. Not AIs fault *I know it wasn't your fault, you weren't the prompter..

aug 25, 2025, 12:49 pm • 0 0 • view

dunno why but this reminds me of that pair of PAID articles by geoff miller and co, responding to bird and jackson by reviewing their own work to declare it isn't, in fact, super racist garbage science

aug 25, 2025, 2:18 pm • 1 0 • view

GIGO

aug 25, 2025, 1:26 pm • 0 0 • view

Because LLMs without RAG (or something similar) are shameless bullshitters link.springer.com/article/10.1...

aug 25, 2025, 11:17 am • 0 0 • view

Garbage In, Garbage Out. As true today as it ever was.

aug 25, 2025, 9:41 am • 3 0 • view

ChatGTP may be a great tool to synthesize information but it’s crap at critically evaluating that information. Human eyes required.

aug 25, 2025, 12:56 pm • 0 0 • view

I am confounded as to why I keep seeing these stories. It is well established that these programs are not search engines or intelligence of any kind. They confabulate responses based upon keywords and generate results intended to please the asker. They can never be relied upon for accuracy.

aug 25, 2025, 12:04 pm • 22 0 • view

It's strange. These tools are designed to produce text that could pass as an evaluation when asked to do so, but they cannot perform the act of evaluation. Still some people try that anyway and then appear to be suprised about the results?!

aug 25, 2025, 7:01 pm • 1 0 • view

Not as well established as you think. There are plenty of folks who think it’s actually intelligent because of how seemingly naturally it interacts and apologises and rewrites when you correct it. The fact that it also does that when you ‘uncorrect’ it does not ring any alarm bells at all.

aug 25, 2025, 4:10 pm • 3 0 • view

Because the marketing around them is "they have PhD level intelligence" and that they can be reliably used in schools and workplaces to fully replace human activities.

aug 25, 2025, 12:05 pm • 23 0 • view

Of course. Any product advertised to c-suites as a way to reduce personnel is going to be pushed hard. Regardless of the outcome. And I am still judging anyone who admits to using these products, especially anyone in the sciences.

aug 25, 2025, 12:37 pm • 10 1 • view

Huge problem, in ALL areas of current life!

aug 25, 2025, 4:30 pm • 0 0 • view

I did not see in this report that they asked GPT to consider retractions.

aug 25, 2025, 11:00 am • 0 0 • view

An intelligent entity would know to do so.

aug 25, 2025, 11:36 am • 4 0 • view

An educated one anyway

aug 25, 2025, 4:19 pm • 0 0 • view

And that’s the state of AI today - only use it if you already have the knowledge and need AI to summarize or perform tasks based on what you already know and can “predict” the outcome. AI is mostly a good tool to automate what you already know but takes too much time to do it yourself.

aug 25, 2025, 7:22 pm • 1 1 • view

Is there any disclosure on how LLM’s have ‘quality’ of scientific papers defined in them ? Text prediction algos dont analyze research methods or timescale’s of research etc. If they did they’d be ‘screaming’ about the need for urgent climate action and energy transition for instance

aug 25, 2025, 10:16 am • 9 0 • view

The mark up process of LLM data wouldnt cater for it either, its not in its architecture. Its all an illusion

aug 25, 2025, 10:18 am • 4 0 • view