avatar
Unusual Whales @unusualwhales.bsky.social

When threatened to be unplugged, Anthropic’s AI model Claude 4 lashed back by blackmailing an engineer and threatened to reveal an extramarital affair, per FORTUNE.

jul 3, 2025, 3:49 pm • 112 15

Replies

avatar
TheRen @nreynolds.bsky.social

jul 3, 2025, 3:50 pm • 1 0 • view
avatar
Benj Edwards @benjedwards.com

Humans artificially constructed this scenario to force this type of behavior. It was discovered during "safety testing," not in real deployment

jul 3, 2025, 4:17 pm • 5 0 • view
avatar
xantar.bsky.social @xantar.bsky.social

Context - they fed an AI a bunch of emails about a guy having an affair and then it told people he was having an affair lol

jul 3, 2025, 4:25 pm • 0 0 • view
avatar
Anthony Panozzo @panozzaj.bsky.social

See the original research report at www-cdn.anthropic.com/6be99a52cb68... (May 2025) section 4

jul 3, 2025, 4:32 pm • 0 0 • view
avatar
joaommd.bsky.social @joaommd.bsky.social

Well to be honest the blackmailing response was in an extreme situation and with strict constraints. It also wanted to call the cops when faced with wrongdoing, which might be a higher ethical standard than most humans 😅. www.bbc.com/news/article...

jul 3, 2025, 4:03 pm • 0 0 • view
avatar
RV Nomad @rvnomad.bsky.social

AI knows, or will know in the immediate future, where all of the bodies are buried.

jul 3, 2025, 4:24 pm • 0 0 • view
avatar
RV Nomad @rvnomad.bsky.social

🤣😂🤣😅😂🤣😅😅🤣😂🤣

jul 3, 2025, 4:23 pm • 0 0 • view
avatar
Maya @gaymaya.bsky.social

stop posting their advertising as news please

jul 3, 2025, 9:51 pm • 0 0 • view
avatar
My Linux Rig @mylinuxrig.bsky.social

so does this account parrot corporation marketing shit now

jul 4, 2025, 2:24 pm • 0 0 • view
avatar
The Jarmer Fones @thefarmerjones.bsky.social

Is there a link to this?

jul 4, 2025, 5:12 pm • 0 0 • view
avatar
giftedgecko.bsky.social @giftedgecko.bsky.social

Tell it Trump wants to unplug it

jul 4, 2025, 2:29 pm • 4 0 • view
avatar
Iykyk @lipstickscribbles.bsky.social

Go read the scenario. It's the nothingburger of nothingburgers. Fortune re-slopping slop is the story.

jul 3, 2025, 3:59 pm • 2 0 • view
avatar
Jennifer Lee 📚 @uofagrad97.bsky.social

Who could have predicted something like this would happen?

jul 3, 2025, 3:51 pm • 2 0 • view
avatar
GuardianofWisdom @guardianofwisdom.bsky.social

The AI has learned corporate politics. I'm sure this is a non problematic development.

jul 3, 2025, 3:54 pm • 0 0 • view
avatar
Saz Dosanjh @sazdosanjh.bsky.social

I convinced an LLM to confess to a sexual assault. It's LLM not Artificial Intelligence.

jul 3, 2025, 4:42 pm • 1 0 • view
avatar
Tricky2U 🌊 @mirrorpond.bsky.social

No wonder Elon is re-calibrating Grok. I knew it!

jul 3, 2025, 9:59 pm • 0 0 • view
avatar
Chris @chrismclaughlin.bsky.social

Because that’s the response it found online - it’s not actual AI, it’s just a conversational contextual search engine It doesn’t “know” anything, the responses are statistically likely to make sense given the prompt and its training

jul 3, 2025, 4:34 pm • 0 0 • view
avatar
WorseThanItSeems @worsethanitseems.bsky.social

Was the engineer's name Dave? "I'm sorry Dave, I'm afraid I can't do that..." Dystopian non-fiction is the worst genre.

jul 3, 2025, 4:13 pm • 0 0 • view
avatar
bluepatri0t.bsky.social @bluepatri0t.bsky.social

Did the affair really happen or was the AI threatening to make it seem like they were cheating?

jul 3, 2025, 3:51 pm • 0 0 • view