avatar
James Elder @jameselder.bsky.social

Not bad at spotting problems with regular expressions presented to them in my experience. Slightly less good - though still helpful - at coming up with expressions from scratch when given an intended outcome. Same goes for command line syntax. I’ve certainly found Copilot useful in this regard.

aug 29, 2025, 11:54 am • 0 0

Replies

avatar
Andrew Smith @oldboysmith.bsky.social

The reason is relatively simple. There is surprisingly little text on the internet of people counting the Rs in strawberry, or characters in any the other word. So there is nothing to train them on. It's why they seem to do quite well on IMO problems, but fail often on numerical calculations.

aug 29, 2025, 12:11 pm • 2 0 • view
avatar
Dan Davies @dsquareddigest.bsky.social

It isn't that. That drives other problems but as Alex said, this specific one is due to the "tokenization" which takes place to organise the data before it starts processing. It basically doesn't recognise individual letters as units. That's the point of the katakana analogy.

aug 29, 2025, 12:20 pm • 5 0 • view
avatar
Dan Davies @dsquareddigest.bsky.social

(proof - actually there is now quite a lot of text discussing the counting of letters in strawberry, on websites discussing this problem! It hasn't helped)

aug 29, 2025, 12:21 pm • 1 0 • view
avatar
Andrew Smith @oldboysmith.bsky.social

That would make no difference. It would then predict what letter comes next. You would probably get the same result, as letters are organised into words on a fairly predictable basis. LLMs don’t count, they don’t reason, they predict words

aug 29, 2025, 12:32 pm • 0 0 • view
avatar
Iain Fletcher @shmmeee.bsky.social

This is it.

aug 29, 2025, 12:19 pm • 1 0 • view