Thoughts: Favorite tweets

Sunday, April 28, 2024

Favorite tweets

Direct Nash Optimization (DNO) better than DPO for RLHF/RLAIF? 🤔 DNO is a new RLHF method from @Microsoft Research using a batched on-policy algorithm that conducts self-improvement iteratively via contrastive learning. 👀 Implementation: 0️⃣ Start with an SFT LLM (☸️), e.g.,… https://t.co/2sSe6mTyHI https://t.co/NLSS6YuFOA
— Philipp Schmid (@_philschmid) Apr 28, 2024

from Twitter https://twitter.com/_philschmid

April 28, 2024 at 08:16AM
via IFTTT

No comments:

Post a Comment

Subscribe to: Post Comments (Atom)