Thoughts: Favorite tweets

Sunday, January 14, 2024

Favorite tweets

Thoughts on the new Anthropic paper: (1) It's seminal, really useful, and a slam dunk in general. (2) But it's not the *first* evidence we have of insidious failure modes evading adversarial training in LLMs. I compiled some related work here: https://t.co/M2w7l9LCo5
— Cas (Stephen Casper) (@StephenLCasper) Jan 14, 2024

from Twitter https://twitter.com/StephenLCasper

January 14, 2024 at 08:17AM
via IFTTT

No comments:

Post a Comment