Until now, HELM has evaluated LMs with on short responses, where evaluation is simple. We now introduce HELM Instruct, which evaluates open-ended instruction following. We evaluate 4 models on 7 scenarios using 4 evaluators against 5 criteria: https://t.co/5dyRHDjlrF
— Percy Liang (@percyliang) Feb 19, 2024
from Twitter https://twitter.com/percyliang
February 19, 2024 at 05:36PM
via IFTTT
No comments:
Post a Comment