Thoughts: Favorite tweets

Tuesday, November 28, 2023

Favorite tweets

Reason number 4247 LM evals are cursed: evaluation papers that are self-contradictory about how models are prompted! There is no official code for this task AFAIK, but the Big Bench implementation disagrees with both of the ones shown in the screenshot! https://t.co/2XCUPWbFhW https://t.co/gHNesQ74rX
— Stella Biderman (@BlancheMinerva) Nov 27, 2023

from Twitter https://twitter.com/BlancheMinerva

November 27, 2023 at 08:05PM
via IFTTT

No comments:

Post a Comment

Subscribe to: Post Comments (Atom)