I didn't reference any numbers, so take your disagreement to which ever reference they gave and take it up with them directly. You didn't cite your 7%. I was merely suggesting that yes, twice that is bad.
I didn't reference any numbers, so take your disagreement to which ever reference they gave and take it up with them directly. You didn't cite your 7%. I was merely suggesting that yes, twice that is bad.
No you said on its best day, these are all different models. You could have just Googled how often people lie and found it, I'm not trying to prove anything to you. The OP cited their statement and it was still intentionally misleading because none of you read and understood the cited data.
Considering various models have fabricated data every time i use them i have experience of it being fairly highly crap but ok. If it's worst day is 15 it's still fairly terrible. And people transferring data don't generally lie, they may make mistakes. It regularly lies to me.
Technically it's every session when i sit down to use them. Eventually I either give up or get what i needed. Earlier models were actually better than now.
I completely disagree. Early models, as shown in the chart, were statistically and quantifiably much worse. I'm glad you're experimenting with AI, but you clearly want to blame it for its failure rather than take equal ownership of your partnership in that failure. AI isn't always right.
I have worked with it a fair bit and was enthusiastic as it would simplify things extensively. But I've found llms just aren't trustworthy for handling data. AIs can be great. I haven't been impressed with llms for data.
Considering various models have allowed me to build entire applications and research topics with cited materials I can verify, I have experience of it being fairly great but ok. If its worst day is 15, it's not far off from a person and you must think people are kinda terrible instead of fairly so.
15 is a lot worse than a person if a person is 7.
It is only a lot worse depending on the tolerance to error. Twice as bad doesn't mean bad. If the tolerance for error is 30% it is still far beyond acceptable. If the tolerance of poop shoveling is 30% and AI had an error rate of 15% and can self correct, why would you force a human to shovel shit?
Another possibility is that you're just not good at using AI, it isn't generally intelligent and how you prompt it and what context it is given matters a lot. The same goes for a human, what you ask, how you ask and what data they have greatly matters. People will pretend to know things they don't.
That doesn't explain why it used to be better and since I've had ai experts try to create prompts to solve it once it started this and failed or that i can eventually get it to work on occasion but same data same prompt another day gives a different answer
That boils down to you not understanding what you're using. It didn't used to be better, it quite literally used to be horrible. Newer models are significantly better in general. As for why it generated different results, LLMs are not deterministic in nature.
Literally used the same llms different versions. Used to make life easier, now it doesn't. Out walking the dog so not going to dig into the data.
Ps wasn't using them to be deterministic. That's no good with llms
But you are talking about them as if they are, which would be intentionally misleading. You shared nothing about your use case except implying that you expected the result to be deterministic. So at this point, you're not behaving much better than what you claim AI to be doing.
And? You've shared nothing about how you tested, what the metric was for success, what the tolerance was for error, or even what you were promoting. You are just expecting me to run with whatever you say and that in itself is a big reason why people fail at using AI. No worries, I'm going to work
If you'd like to discuss that, we should move somewhere without a character limit as this isn't conducive to a real discussion.