Thoughts: Favorite tweets

Tuesday, October 1, 2024

Favorite tweets

I often wonder how Meta did such a good job post training the Llama series of models. They just released a paper that gives us a good idea. The big challenge is that using a single reward model to align an LLM on multiple tasks fails due to reward hacking, multi-objective… https://t.co/GLIssWlwsq https://t.co/5D44bb8qS3
— Andrew Carr (e/🤸) (@andrew_n_carr) Oct 1, 2024

from Twitter https://twitter.com/andrew_n_carr

October 01, 2024 at 06:09PM
via IFTTT

No comments:

Post a Comment

Subscribe to: Post Comments (Atom)