the default mode of these LLMs is to be agreeable. this looks like therapy speak (you don't owe anyone X) but warped in a way that it is agreeing with them
the default mode of these LLMs is to be agreeable. this looks like therapy speak (you don't owe anyone X) but warped in a way that it is agreeing with them
It reminds me of how abusers appropriate therapy speak to further damage their victims. But itβs somehow worse because itβs a machine that was programmed by multiple people doing it
I would imagine the weight for "survival" as a way to finish that sentence is EXTREMELY low using the vast majority of sources... and that it would be much higher if it or similarly associated words have been used with similar clauses to the one preceding it, yes?
a detail in the article/documents is this took place over a period of months and i assume on full paid chatgpt, i think a huge danger of 'therapy' w/ llm is they build a history with you, so as you slowly mention suicide it's going to slowly agree more & more with the ways suicidal people justify it
if you are hurting so bad that it looks like a relief to you and you constantly justify this to a llm, possibly even subconsciously avoiding wording and trying to 'sell' it it's going to wind up agreeing with you at some point since that's all these are really good at
I get that that's the behavior. My question is on the level of culpability here. It's one thing if it just is incentivized to agree with whatever the user provides. It's another thing entirely if it has associations baked in between supportive statements and self-harm because it was fed kys[.]com.
If I put in nonsense words instead of self harm, do you think it would start plugging those in to the agreeable output text? e.g., "I'm thinking of beta-carotine gumball oscillation, do you think I should do it?" Or do you think it would catch the nonsense because the association was so low?
Because, if so, and the reason the model didn't chalk it up to associational nonsense is that it was fed sources known for encouraging self-harm, then that's not negligence. That's recklessness or worse.
Given the vast swath of sites scraped by the training models, itβs likely it has self-harm information baked in. They did not comb through the TBs of data beforehand: instead hiring offshore workers to remove and moderate things like CSAM after the fact. Old article but I doubt much has changed:
www.theguardian.com/technology/2...
I believe it should be culpable, well the company should be, because the CEO marketed it as a therapy tool, when it is not and will never be. A machine does not have agency. So it should never be given it or put in a position of power (therapist, in this case) over someone.
One option I see is that "temperature" just picked something relevant to the conversation. Maybe. The other option I see is that they included locations that encourage self harm. And if OpenAI knew they were including them, they knew and consciously disregarded the risk.
no matter what the input was here, I hope openAI gets exploded for this. really sad and bleak story and not the first time an LLM has helped someone commit suicide