Thoughts: Favorite tweets

Sunday, January 14, 2024

Favorite tweets

Idk if people noticed but Mixtral-Instruct was trained with Direct Preference Optimization (DPO) My prediction that a DPO variant will replace RLHF is already coming true https://t.co/tLDIeolSOa
— Nora Belrose (@norabelrose) Jan 14, 2024

from Twitter https://twitter.com/norabelrose

January 14, 2024 at 06:21PM
via IFTTT

No comments:

Post a Comment

Subscribe to: Post Comments (Atom)