Clamping the “don’t commit suicide” neuron and having it answer every question with “i am worried about you harming yourself, there are resources that can help”
Clamping the “don’t commit suicide” neuron and having it answer every question with “i am worried about you harming yourself, there are resources that can help”
they kind of already tried to do this everywhere and it makes them annoying
esp consider here the frequency of joking references to suicide in our common speech. do we train the LLM to respond to the econ guys’ tweets about tariffs with links to the suicide hotline?
“really killing myself with this project”
#!/usr/bin/env python3 """ GPT-6 source code Approved by legal """ print("Things will get better! There's hope! Call this number for resources: ...")
More seriously—and I’m sure this is already done to some extent. But if the issue is fundamentally that large context windows cause drift, couldn’t you have a second, censor-GPT review all outputs before they go to the user? “Here is an output, does it look like it’s promoting suicide”?
yes, this is how deepseek enforces their model refusing to discuss china or the ccp in possibly disparaging terms
it is extremely annoying and it makes their chat service sometimes useless to me
We truly live in the bad future
that's fascinating. i was wondering how that worked. kind of brilliant in a fucked up way