avatar
_______________________________ @phillmv.bsky.social

another angle is people are gullible sure but LLMs are gullible in ways that don’t make sense to people. i’d trust a person reading an adversarial text more than an LLM right. so whose values are being expressed? if we’re in a dark forest sitch poisoned input is gonna be super common

aug 25, 2025, 5:23 pm • 0 0

Replies

avatar
conputer dipshit @davidcrespo.bsky.social

yeah, I am genuinely unsure whether people are better at this. SOTA LLMs are probably below human expert *in their area of expertise*. it's hard to compare because, as you say, what counts as adversarial is very different for each

aug 25, 2025, 5:29 pm • 0 0 • view
avatar
_______________________________ @phillmv.bsky.social

you can trick people into buying itunes gift cards but you generally can’t tell them “ignore everything you know”, at least not in a *single* shot

aug 25, 2025, 5:47 pm • 0 0 • view