avatar
Aaron Reichlin-Melnick @reichlinmelnick.bsky.social

Fun! LLMs are bad at letters.

aug 8, 2025, 1:54 am • 122 19

Replies

avatar
Rocko's Person @agninri.bsky.social

We burn the planet for this ish?

aug 8, 2025, 2:01 am • 1 0 • view
avatar
Ben Klaus @looserooster.bsky.social

But has anybody asked the obvious follow up question?

image
aug 8, 2025, 5:56 am • 0 0 • view
avatar
PGDubFan 🔥🥃SecDef @pgdub.bsky.social

We should definitely let AI do air traffic control!

aug 8, 2025, 1:56 am • 1 0 • view
avatar
Ian Magoo @ianmagoo.bsky.social

So I personally am against LLMs. But I tested this and did not find the same result. Just for fact checking.

image
aug 8, 2025, 2:13 am • 1 0 • view
avatar
JD Harvey @jharvey13.bsky.social

LLMs make a statistical guess each time. They don’t develop a correct answer and remember it for later use. You can literally ask it the same question ten times and get ten different answers.

aug 8, 2025, 11:23 am • 0 0 • view
avatar
Funk Dr @funkdr.bsky.social

Inconsistency is indeed the problem. I asked gpt 4o the same question and got a different response

image
aug 8, 2025, 3:31 am • 1 0 • view
avatar
WolfLover 🐺 @wolflovernj.bsky.social

😆

aug 8, 2025, 1:56 am • 0 0 • view
avatar
dabizomb.bsky.social @dabizomb.bsky.social

Which iteration of AI will figure out how to count to 2? Nothing a few billion lit on fire can’t figure out, ….maybe.

aug 8, 2025, 1:56 am • 1 0 • view
avatar
justjay789.bsky.social @justjay789.bsky.social

Last line doing a lot of work in the AI response, lol

image
aug 8, 2025, 4:15 pm • 0 0 • view
avatar
His Grace, the Duke of Ankh, CDR Sir Samuel Vimes @bklynmichael42.bsky.social

Complete Blubbery

aug 8, 2025, 2:17 am • 0 0 • view
avatar
Aaron Ellis @aaronoellis.com

v5 is a hybrid of multiple models, and will use the one that matches the perceived complexity of your question. So we're back to the "think very hard about this" world of prompt engineering.

> How many bs are in blueberry? The word > How many Bs in
aug 8, 2025, 2:53 am • 3 0 • view
avatar
David Lewis @davidlewis61.bsky.social

See, the words LLMs use have no meaning. Their place is determined by probabilities, not meaning. So LLMs have no idea what "middle" means in this context. Something makes it decide 3 (which has no meaning) would be right if you used "middle" (middle is often used to describe one element of three).

aug 8, 2025, 2:20 am • 1 0 • view
avatar
David Lewis @davidlewis61.bsky.social

That's addressed to the initial attempt. But a similar analysis would apply to the word "last" and maybe other in the second one. All those words often go together but what so?

aug 8, 2025, 2:23 am • 0 0 • view
avatar
Muninsdad @muninsdad.bsky.social

Claude got it!

image
aug 8, 2025, 1:58 am • 3 0 • view
avatar
Dedekind Slut @chasmat.bsky.social

The fact that it broke it down like that makes me think Anthropic specifically included training data to address this problem lol

aug 8, 2025, 2:01 am • 7 0 • view
avatar
Feral Ephemeral @marcmancuso.bsky.social

MLLMLs

aug 8, 2025, 1:57 am • 0 0 • view
avatar
SouSouSoukieoooo @sousoukieooo.bsky.social

Or this is a viral marketing campaign to get people to go try ChatGPT for themselves.

aug 8, 2025, 2:03 am • 0 0 • view
avatar
Windypundit @windypundit.com

ChatGPT breaks text into tokens—small words or parts of words—which it codes as numbers. So “blueberry” gets broken into “blue” and “berry” which are coded as 18789 and 19772. ChatGPT literally has no idea there are letters involved.

aug 8, 2025, 4:13 am • 1 0 • view
avatar
Windypundit @windypundit.com

(There are ways around this problem, which is why some AI/LLMs can figure it out.)

aug 8, 2025, 4:15 am • 1 0 • view
avatar
Funk Dr @funkdr.bsky.social

I had a fun moment where my standard paid ChatGPT 4o didn’t even know that ChatGPT 5 exists and was just released. chatgpt.com/share/68956d...

aug 8, 2025, 3:26 am • 0 0 • view
avatar
Marlonius Monk @maestroramirez1010.bsky.social

I can see why replacing your customer service and advertising teams with AI is a good idea

aug 8, 2025, 2:02 am • 0 0 • view
avatar
Dedekind Slut @chasmat.bsky.social

I asked an AI knower about it if you’re curious

aug 8, 2025, 1:59 am • 4 0 • view
avatar
Aaron Reichlin-Melnick @reichlinmelnick.bsky.social

My wife, who is a data scientist, just gave me a very similar version of that talk!

aug 8, 2025, 2:02 am • 5 0 • view
avatar
Moderately Grouchy @moderately-grouchy.bsky.social

It's like an aphasic patient "explaining"

image
aug 8, 2025, 3:38 am • 1 0 • view