Reason number 4247 LM evals are cursed: evaluation papers that are self-contradictory about how models are prompted! There is no official code for this task AFAIK, but the Big Bench implementation disagrees with both of the ones shown in the screenshot! https://t.co/2XCUPWbFhW https://t.co/gHNesQ74rX
— Stella Biderman (@BlancheMinerva) Nov 27, 2023
from Twitter https://twitter.com/BlancheMinerva
November 27, 2023 at 08:05PM
via IFTTT
No comments:
Post a Comment