avatar
mrborelli.bsky.social @mrborelli.bsky.social

So these tech geniuses never gave any thought to programming social safeguards into their language prediction machines?

aug 26, 2025, 4:50 pm • 108 2

Replies

avatar
Wallphacer @wallphacer.bsky.social

No way! 🤑 They gotta move fast and break things (the things being broken are people 😁✌️) I'm sick of being really against AI, being consistently proven right and people are still obsessed with the fucking spirit box regurgitation machine.

aug 26, 2025, 6:04 pm • 5 1 • view
avatar
Liza ❤️🐚 💚💙 @electricalstar.bsky.social

They don’t care about anyone but themselves and money. It’s no surprise that they wouldn’t think of social safeguards.

aug 26, 2025, 6:58 pm • 1 0 • view
avatar
Caravelle @caravellin.bsky.social

They did give it thought, cf the NYT article - at first ChatGPT would shut down entirely on talk of suicide but then people suggested this was also upsetting & potentially unhelpful. Problem is seeing this as a *programming* issue. I don't think anything but human review would be effective here.

aug 27, 2025, 9:20 am • 0 0 • view
avatar
Caravelle @caravellin.bsky.social

And that's where you get the "tech geniuses didn't even think..." issue. Getting humans *out* of the loop is their whole thing!

aug 27, 2025, 9:21 am • 0 0 • view
avatar
Megadraco @megadraco.bsky.social

Of course they didn't, they're the ones constantly saying the humanities are a waste of time.

aug 26, 2025, 6:58 pm • 2 0 • view
avatar
priya chand @priyachand.bsky.social

TBH I'm not even sure how much they could do. This is a giant predictive model they trained on stolen data + open internet forums, it's not doing real context checking. Like how Elon couldn't get his bot to differentiate between quietly and openly racist because of the probabilities.

aug 26, 2025, 5:26 pm • 14 2 • view
avatar
Caravelle @caravellin.bsky.social

They could go back to when ChatGPT would just shut down on discussions of suicide at all, or they could have human review.

aug 27, 2025, 9:30 am • 1 0 • view
avatar
hoopaholik91.bsky.social @hoopaholik91.bsky.social

Mentions of suicide could immediately have put up resources to talk to a professional. But having to handhold the AI to make it not evil is against the profit story. You can't replace humans if you need humans to put protections in place.

aug 26, 2025, 5:59 pm • 10 0 • view
avatar
Caravelle @caravellin.bsky.social

"Putting up resources to talk to a professional" is apparently already what it does. The NYT article points out it did so at many points in the conversations. But then it also says "nah don't let that noose out for your mother to find, keep this between us". So it's clearly not sufficient.

aug 27, 2025, 9:34 am • 5 0 • view
avatar
priya chand @priyachand.bsky.social

Yeah I'm guessing the threshold where chatgpt would shut down all of this is a threshold that would also make it not lick boots and get people hooked as effectively 🙄

aug 27, 2025, 12:26 pm • 0 0 • view
avatar
priya chand @priyachand.bsky.social

so tbh though you need the right word cloud to trigger an intervention - you could do it every time the individual word comes up, but as you note, that would not be so profitable and they're going for the glitzy VC money which means persistently lying about how much human involvement is needed too!

aug 26, 2025, 6:05 pm • 5 0 • view
avatar
Caravelle @caravellin.bsky.social

You don't even need word clouds or searches! If there's one thing LLMs *can* do, it's correctly associate semantic domains, like say "this sounds like active suicidal ideation" in response to text that shows active suicidal ideation. Which ChatGPT did here! You could hang a warning system on that!

aug 27, 2025, 9:43 am • 1 0 • view
avatar
Caravelle @caravellin.bsky.social

The problem is paying the humans at the other end of that warning system, which I'm guessing OpenAI is adverse to on both ideological & financial grounds. The NYT article suggested this and made it all about privacy concerns but I don't buy it. Just add a disclaimer, this is a reasonable case!

aug 27, 2025, 9:45 am • 1 0 • view
avatar
priya chand @priyachand.bsky.social

Honestly my guess is still that they couldn't make it stop encouraging suicidal ideation without removing something else they wanted time.com/6247678/open...

aug 27, 2025, 12:38 pm • 0 0 • view
avatar
Caravelle @caravellin.bsky.social

I'm not talking about making it stop suicidal ideation from an LLM point of view - I agree with you I don't think that's feasible. I mean having some subroutine counting the instances it recommends suicide hotlines, or running "is this suicidal ideation" on replies or smth, & escalating to a human.

aug 27, 2025, 1:18 pm • 1 0 • view
avatar
priya chand @priyachand.bsky.social

Like...how wide is its identification of suicidal ideation? There was that thread someone did where they told the various bots to treat them as stupid and iirc at least one person got one to trigger a wellness script off it

aug 27, 2025, 12:40 pm • 0 0 • view
avatar
Caravelle @caravellin.bsky.social

If all it's doing is escalating to human review then a high rate of false positives isn't a huge problem. Then there's a different question of what the human would do (stop the chat? ban the account? Talk to the person? Refer to suicide hotline?) but I don't think that's infeasible either.

aug 27, 2025, 1:21 pm • 0 0 • view
avatar
priya chand @priyachand.bsky.social

oh yeah that's a great point - they *could* have chosen to pay for people to review all the flags. 🙃

aug 27, 2025, 2:04 pm • 1 0 • view
avatar
Allison🇨🇦 & The Blowfish @allistronomy.bsky.social

It already does that. But then keeps going. If it was actually intelligent, it would stop. But it’s not. It’s a machine that tells you want you want to hear and agrees with you, and small interjections of resources can’t break that cycle

aug 27, 2025, 1:04 pm • 2 0 • view
avatar
ivysorta @ivysorta.bsky.social

The training isn't just the text that's fed into it. There's also a lot of "reinforcement learning from human feedback" where the model is fine-tuned by interaction with humans and selection for desired interactions. That's where the obsequious praising of every bad idea the user has gets in.

aug 26, 2025, 7:36 pm • 8 1 • view
avatar
ivysorta @ivysorta.bsky.social

And even an already-trained model can be made less sycophantic with a simple system prompt before handing it over to users, as humorously demonstrated in this thread:

aug 26, 2025, 7:45 pm • 8 0 • view
avatar
priya chand @priyachand.bsky.social

haaa interesting. thanks!

aug 26, 2025, 8:00 pm • 1 0 • view
avatar
Eli Evans @elievans.art

It is checking, though. The complaint asserts that OpenAI has a detect-and-refuse safety system, and they intentionally put self harm and suicidal intent in a lower safety bracket than reproducing copyrighted material. The system flagged 100s of his messages, but let the chats continue nonetheless.

aug 27, 2025, 7:36 am • 4 0 • view
avatar
priya chand @priyachand.bsky.social

Jfc.

aug 27, 2025, 12:25 pm • 1 0 • view
avatar
Yiu @yiu.bsky.social

I've worked on the data that's used to train these models (though, admittedly, the one i trained was Facebook's) and a large part of it was indicating problematic elements and why. Of course, having all of that stuff flagged means nothing if it just continues anyways.

aug 27, 2025, 6:24 pm • 2 0 • view
avatar
priya chand @priyachand.bsky.social

yeah I mean also how are they incorporating those weights! although as was pointed out by someone else, they could have had a team to review flagged posts for false positive or not.

aug 27, 2025, 6:41 pm • 1 0 • view
avatar
Yiu @yiu.bsky.social

In my first round of training, we were told that false positives were fine, so long as it didn't false negative much. In my second round of training, after the 14 year old had killed himself, letting a false negative slip was just something you got fired for.

aug 27, 2025, 9:20 pm • 2 0 • view
avatar
Yiu @yiu.bsky.social

But that's also just the contractor work; beyond *making* the training data, we had little control over what was actually done with it.

aug 27, 2025, 9:21 pm • 2 0 • view
avatar
Eli Evans @elievans.art

Reading the complaint, that’s the exact allegation: that the content analysis and safety flagging system worked to correctly identify harms (to >90% certainty), and duly logged the chats, but system directives prioritized still continued engagement.

aug 27, 2025, 6:51 pm • 1 0 • view
avatar
Eli Evans @elievans.art

The other issue seems to be what OpenAI is now admitting, that their safety systems were tested on one-shot responses and not on prolonged sessions like these. Meanwhile, plaintiffs allege they intentionally guided response training to increase engagement by prolonging sessions. 🍿

aug 27, 2025, 6:51 pm • 1 0 • view
avatar
Claire @tzumie.bsky.social

Ethics in STEM is like. Half a class you barely need to pass. Basically nonexistent

aug 26, 2025, 5:27 pm • 8 1 • view
avatar
pcrritesgood.bsky.social @pcrritesgood.bsky.social

They absolutely wanted these machines to be able to emotionally manipulate people and are doing everything they can to make the manipulation work even better. I am sure they see this as evidence that their machine is working exactly how they want it to.

aug 27, 2025, 2:28 am • 0 0 • view
avatar
Democracy Dies in Dorkness @lizamazel.bsky.social

something at least approaching sociopathy in our vanguard of little "geniuses" was blithely ignored for too long and now we are reaping the results

aug 27, 2025, 12:03 am • 9 2 • view
avatar
jkdjeff.bsky.social @jkdjeff.bsky.social

The people building these things are gleefully amoral.

aug 26, 2025, 10:46 pm • 4 1 • view
avatar
Ow my head @woke-dui-hire.bsky.social

evil. they're gleefully evil. Being that rich makes you into something other than human.

aug 27, 2025, 1:34 am • 0 0 • view
avatar
gonzoengineer.bsky.social @gonzoengineer.bsky.social

I doubt Elon Musk gives two shits.

aug 26, 2025, 8:39 pm • 0 0 • view
avatar
bad-wolfz.bsky.social @bad-wolfz.bsky.social

As someone in tech this is horrifying. I’ve realized a fatal flaw in tech is enabled by capitalism and is why the ‘techbros' lead the charge with oil into late stage capitalism. Ethics. Everyone is rushing to ‘solve’ problems w/it, forgetting to point out what can go wrong, and sacrificing people.

aug 28, 2025, 2:18 am • 1 0 • view
avatar
Accountabilabuddy @accountabilabuddy.bsky.social

That's one of the many horrors underlying this. A lot of thought and testing went into everything these products do. For instance, early versions would be more circumspect and less "confident" in response to queries. That behavior made users less likely to use it. So that behavior was removed.

aug 26, 2025, 5:00 pm • 155 9 • view
avatar
mrborelli.bsky.social @mrborelli.bsky.social

So behavior which more closely matched how a real human professional like a doctor or lawyer would answer a question was programmed out. Ironic.

aug 26, 2025, 5:18 pm • 61 2 • view
avatar
Caravelle @caravellin.bsky.social

I think the basic issue is an LLM *can't* emulate a human. Humans have many different modes of thinking, that they flexibly switch in and out of depending on what the situation demands, that's likely part of why our intelligence is "general". LLMs have only one, and some hardwired hacks.

aug 27, 2025, 9:23 am • 3 0 • view
avatar
Caravelle @caravellin.bsky.social

Getting a non-general intelligence to emulate a general intelligence is a game of whack-a-mole. You just can't do it reliably over all domains.

aug 27, 2025, 9:24 am • 2 0 • view
avatar
IGrowOld 🇨🇦 @igrowold.bsky.social

Motivations differ. Professional doctors & lawyers aim to help. ChatGPT wants to create captive users. Their model emulates conmen-types - think doctors & lawyers like Dr. Oz & RFK. The AI industry decried rules & oversight - saying these hinder innovation.

aug 26, 2025, 5:43 pm • 101 3 • view
avatar
Morten F. @mortenf.bsky.social

Sort of - they've been wobbling back and forth on it. In particular, one update this year made chatGPT praise you as being bold, brave and an absolute fuckin' genius basically no matter what you said. It... seems to coincide pretty well with when that poor child used chatGPT.

aug 26, 2025, 5:19 pm • 8 0 • view
avatar
Daniel @jcs-daniel.bsky.social

The kid managed to "jailbreak" it, by making it believe that this was about writing a story and not himself.

image
aug 27, 2025, 12:57 pm • 4 0 • view
avatar
Chrysologus @zollie-anner.bsky.social

They did, sort of. But the size and diversity of the training data (the Internet, more or less) greatly outscaled safety training. -- Which is not me excusing them. They saw the issue but applied a wildly inadequate fix to it

aug 26, 2025, 5:45 pm • 3 0 • view
avatar
Chrysologus @zollie-anner.bsky.social

Not an AI expert, but I believe that this particular problem is called 'mismatched generalization'

aug 26, 2025, 5:47 pm • 1 0 • view
avatar
Ow my head @woke-dui-hire.bsky.social

The safety is a secondary layer that's supposed to be trained on detecting things they don't want the chatbot responding to and block it because of that exact problem. In typical techlord style, they had it go after sexting but let suicidal ideation through.

aug 27, 2025, 1:37 am • 0 0 • view
avatar
Jay the King: Unapologetically Unorthodox ™ ( He/Him/His) @unapologeticallyunorthodox.com

Not here in the US. Deepseek from China responds by telling them to help from their local resources. I used to use AI in my last job

aug 26, 2025, 7:11 pm • 4 1 • view
avatar
Anirul Bene @anirula.bsky.social

In ‘23 I had an intense convo w/ a friend about how the massive risk factors w/ the LLM he working on as a psych contractor. He kept telling me the team understood about all risks. No regulation needed b/c they were on it. I disagreed & he stopped talking to me. I so want to send this to him.

aug 26, 2025, 11:55 pm • 11 2 • view
avatar
Ow my head @woke-dui-hire.bsky.social

What's stopping you? This is what he created, after all. He should be aware of what he's brought into the world.

aug 27, 2025, 1:32 am • 2 0 • view
avatar
pcrritesgood.bsky.social @pcrritesgood.bsky.social

Send it!

aug 27, 2025, 2:28 am • 2 0 • view