Thoughts on the new Anthropic paper: (1) It's seminal, really useful, and a slam dunk in general. (2) But it's not the *first* evidence we have of insidious failure modes evading adversarial training in LLMs. I compiled some related work here: https://t.co/M2w7l9LCo5
— Cas (Stephen Casper) (@StephenLCasper) Jan 14, 2024
from Twitter https://twitter.com/StephenLCasper
January 14, 2024 at 08:17AM
via IFTTT
No comments:
Post a Comment