I often wonder how Meta did such a good job post training the Llama series of models. They just released a paper that gives us a good idea. The big challenge is that using a single reward model to align an LLM on multiple tasks fails due to reward hacking, multi-objective… https://t.co/GLIssWlwsq https://t.co/5D44bb8qS3
— Andrew Carr (e/🤸) (@andrew_n_carr) Oct 1, 2024
from Twitter https://twitter.com/andrew_n_carr
October 01, 2024 at 06:09PM
via IFTTT
No comments:
Post a Comment