At this point, I was feeling pretty good about it all ("Ach probably good enough for a tweet") - so I didn't do some pretty basic things. For instance, check those outliers. Why not wait an hour to read the Strategy in full before tweeting about it?
At this point, I was feeling pretty good about it all ("Ach probably good enough for a tweet") - so I didn't do some pretty basic things. For instance, check those outliers. Why not wait an hour to read the Strategy in full before tweeting about it?
So yeah, all very silly. Mea culpa. Thankfully, I doubt very many people saw it but that's not really the point. If it had been for a story, I would've been a lot more thorough (and with others checking), but that's not the point either.
Point is... 1. Don't trust AI to run even the most basic statistical things (yes, yes I knew that already, but then you get told you must use this stuff, and then...). 2. Don't rush to post things to social media, it creates dumb and embarrassing situations like this one.
Still better to be honest about one's shortcomings and use it as a learning opportunity, rather than just hope people forget about the error and move on. Yours, Today's AI schmuck.
Not that I'm warning people not to use AI. It's just good to be (more) aware of what it's good for, and what it's not good for.
Writing an Excel macro or Python script? Great! Grammar, spell-checking? Cool. Exploring how a story idea might link thematically to other fields and disciplines? Useful. Counting? No.
Okay but like if it's not good for something as simple as COUNTING, then ... how smart and how good is it as a tool
And just to stress - for folks saying this shows the decline of journalistic standards: this wasn't used for a Reuters article, it didn't go anywhere near one. It was just something I tweeted hastily.
I appreciate there is some overlap between an employer and their employee's social media feed - and how my conduct reflects on them. But it's also important to draw a distinction that my ramblings here are not Reuters.
Having reverse-Streisanded what is really a pretty minor mistake in the scheme of things, concerning an ephemeral tweet that got virtually no traction - I'd just like to reassure everyone that I've made waaaay more serious errors that have not attracted any attention.
Bravo Andy. A lot of integrity on display here. (Didn't see the chart).
You're a fine journo imo. I never saw the tweet and your transparency over this is admirable. Mistakes happen.
That's not really better, though.
Most everyone does. This was thoughtfully put and these things of course are sucker machines, built to keep you talking, it happens, it's all without malice, just that none of them care about you at all other than that you prompt it again
Thank you SO much for this Andy. There have been some infamous million-dollar mistakes that often get rolled out for demonstrating these issues, but they're always so unrelatable and get lost in the noise. I feel this is a far more effective teachable moment. I'll be referring to it for some time 🙏
Great thread. Recently, when sifting through hundreds of pages of pdfs for counting purposes, I compared the no-effort AI version to the high-effort human version. AI answers were close enough to sound credible, but were also in almost every case wrong. bsky.app/profile/toby...
That’s the danger though isn’t it. If it was obviously wrong, it would be easier to spot. But it’s here and there correct and then made up and at some point we no longer know what’s real and not.
And it’s that little bit of non-reality that can slip through, and then gets amplified each time until we have created a monster not grounded in reality and it will take a long time to unpick where the errors came in
Like when it counts the Rs in strawberry, straw and berry
hey andy, appreciate your transparency. earlier in the thread you said work has been encouraging you to use this AI. so even though this didn't impact work you did for your employer, do you think this error was partly a result of the pressure to use copilot?
And also, have you or will you share this experience with your employer's decision makers?
Working at a major tech company and we have been inundated with requested to use AI in our daily work. Some of it is good, but when I’m told to use it for peer reviews and generating self reviews it feels awful.
That may be the most thorough and honest mea culpa I have ever seen. Two Our Fathers, one Hail Mary and file it under "shit happens" sub folder "nobody died".
😂 I'm starting to regret it - I've made many more serious mistakes than this one that have received a lot less attention!
AI is pretty good at writing Visual Basic; I had a thing I needed to do in Access that the API didn't seem to support doing with menus (changing a linked table's data source from a file to a table on a SQL server); and AI gave me a working subroutine on the first try. But I did extensively test it!
One way to (maybe?) get a better outcome in your case: ask it to 1/ process the text and add markers to the region/places 2/ write a python script to collect these markers and do the stats you want. you get a usable doc to check its marking work & the code is testable (and probably reliable)
Thank you, that's a great idea.
I would add… Using an LLM to generate text, music or images? Absolutely not…at a minimum, not until we have addressed the rot/exploitation at the heart of current AI models. No more strip mining of our culture & humanity to build commercial products. That shouldn’t be controversial.
💯
oh god don't trust ai code (as a programmer) This is one of my biggest things I am afraid of. Code that looks correct on the surface and produces valid results for the test cases at hand, but doesn't take edge cases into account in the way that a human programmer can learn to.
I hear you - and I wouldn't want to overstate what I said. I've used it to write some Excel macros and very simple Python scripts - to perform stuff I've done manually for a long time, know well (in terms of whether the output works), and can test. I would seek expert help to do more!
Software support engineer here, and that is my experience as well - it's great for "write me a script/query for blah" or "parse this well-defined file format" things that I COULD do but would require some effort on my part, and I would never even consider letting it write production code.
I have gotten myself into such a rat hole with it trying to do something moderately complicated.
Also if you start using it to try and fix code it already produced the situation often becomes a tangle of crap where you are always shifting one more thing to try and fix the messes the last iteration produced.
For simple tasks, highly specific functions and scripts and interview questions it’s amazing, which makes sense as it’s basically condensed example code, tutorials and stack overflow answers.
Since going over to Windows 11 I cannot see the cursor on Excel spreadsheets.
There are approximately 8,464,265 ways to improve cursor visibility both in Office options and Windows Settings. (AI may have overestimated the true count there, but... there's a lot of them.)
My firm works within something called Citrix. Access their external IT contractor has had no joy. They achieved a temporary fix by backdating to an earlier graphics card but the system soon updated that to one that restored the problem. Outside Citrix on my laptop on my Excel there is no problem
It's bad at grammar and spelling It *might* produce a decent macro/script.. but you have to check. It's *only* potentially useful for things that are much easier to verify than to create/locate. --even then it's a climate-apocalypse plagiarism machine and should not be used unless truly desperate
I’m not clear exactly what happened or the extent of the wrongness. (Your original post is deleted) Can you say how different the AI analysis was from the actual document content? (For Wales, for example.)
Sorry. Basically the AI response was right for a lot of places, badly wrong for others - eg it massively undercounted references to Yorkshire. And while I should have spotted that, some other similarly-sized regions had genuinely a small number of references, so it wasn't implausible.
Out of curiosity, have you tried similar exercises with other AI platforms, or was this a one-off with Copilot?
Thanks for the details. A key danger with AI seems to me to be just that: plausibility. It’s programmed, in effect, to give plausible answers. It ruins all the short cuts and rules of thumb we have to detecting dodgy work.
📌 Pinning this because "ruins all the shortcuts and rules of thumb for detecting dodgy work" is an excellent explanation of what I've been trying to explain to my boss.
I’m still confused as to why LLMs are bad at counting. I’ve run into the same problem, even though it seems like something that they should be good at. Counting is a basic enough computing function that it makes me doubt everything else
Because AI's don't think, they just generate plausible text. If prompted "2+2", it answers "=4", is not because it learned to add - it's just because statistically "=4" was the most probable next sequence of letters based on its training. LLMs *do not* reason or think...
You can still do cool stuff, mind. If your AI tool recognises it's being asked to do maths, instead of actually trying to guess the result it can (behind the scenes) say "show me a Python script that would calculate this sum" (essentially a text task), and then run the script...
This is how things like ChatGPT give the impression of being smarter than the LLMs they are based on. LLMs *are* super cool, for some things. The problem is that they have been bullshitted to infinity by the Altmans/Musks of this world, and people are going to be very disappointed.
I'd like to understand this as well. I asked Google AI how far along my DIL's pregnancy is and even with spitting out all the correct dates it confidently said 20 weeks and three days..my first, strong reaction was why did she want to wait so long before letting us know, then figured it can't be.
She was eight weeks and two days along. I just can't rely on AI even as a starting off point if it can't count.
It can't count *at all*. It doesn't even know what a number is. What it's actual giving you essentially is the most probable text that follows that question. Simplistically, it's seen more training where the answer to that question was "20 weeks" than "8 weeks", so that's the answer it gives.
Wow, I did not understand that.
Because counting is not something they do. They don't answer questions but produce answer-like objects by predicting what you want to read. At no point do they engage with the meaning of the question.
LLMs are "bad at counting" because all they do is produce the statistically most likely next thing to come in a sentence. They don't actually count anything.
i feel for you 🫡 never ask them to count or do simple math!
Yes. And god knows how many times I've seen people post stuff like: "I asked this AI if 5 is less than 4 and look what it told me..."
I often use the chrome browser bar to calculate a quick percentage for non important stuff while on a page. (Eg, 5.2% of a number). Once I forgot I was using a different browser (Brave) which gave me an AI generated answer when I typed in x percent of y, which I knew instantly was (massively) wrong.
Don't be so blindly accepting of the macro and script. IF you know the langugages and the expected outcomes, you can check them. But you still need to check them -- imitative AI makes weird, hard to find mistakes even in coding.
I hear you - and I wouldn't want to overstate what I said. I've used it to write some Excel macros and very simple Python scripts - to perform stuff I've done manually for a long time, know well (in terms of whether the output works), and can test. I would seek expert help to do more!
You should be
And that's the problem with the term AI. There is useful AI - supervised learning for medical imaging, or Alpha Fold. Then there are LLM's, which are basically bullshit generators. And we have enough human ones of those already eprints.gla.ac.uk/327588/1/327...
Admitting a mistake, on the internet? What the hell man!
Refreshing perspective. Genuinely, thank you.
Hey, thanks for the introspection and willingness to go "my bad", and for not doubling down. I appreciate the examination of what happened.
Yeah, it's innumerate. You can have it write a python script to make to do things like count things in a document, but if you ask it to do anything requiring understanding numbers, it'll fail.
Andy your retraction and explanation does you great credit, but I'm left with some questions: You say you're "encouraged to use Copilot at work"— Do you think that the policy should be questioned? Is it allowed to be questioned? Is this damaging work product? Are others having the same experiences?
Whether he is able to challenge or ignore it I have no idea but it's definitely a horrible policy that's damaging a lot of businesses.
There’s also the question of the environmental damage ai does.
As someone who works with AI on a daily basis, I would say don't trust AI to get anything right. You have to read every word it spits out and know the subject matter inside out, so you can catch all the errors.
Bummer! Would love to see the model and prompt you used if you’re up for screenshotting.