Thoughts: Favorite tweets

Monday, September 30, 2024

Favorite tweets

New paper from Salesforce shows that you need to train critique models (for LLM as a judge) on a bunch of different behaviors, and you can do it with DPO. Direct Judge Preference Optimization: Also uses a combined SFT + DPO joint loss and uses Llama 3.1 70B to generate high… https://t.co/hnZ4BpEYSO https://t.co/Fh7pMnilKK
— Nathan Lambert (@natolambert) Sep 30, 2024

from Twitter https://twitter.com/natolambert

September 30, 2024 at 04:21PM
via IFTTT

No comments:

Post a Comment

Subscribe to: Post Comments (Atom)