To summarize, would even good faith attempts to make LLMs safer be futile? Or, would trying to create a safe and sane system be similar to laws against running red lights - imperfect compliance but better than nothing?
To summarize, would even good faith attempts to make LLMs safer be futile? Or, would trying to create a safe and sane system be similar to laws against running red lights - imperfect compliance but better than nothing?
When it comes to the generative side of things, probably closer to the latter. You can probably get them to the point where you can't casually bypass the guardrails - especially on smaller, more focused models. But a truly determined person will always be able to get around them.
Thanks for your posts, much appreciated.