Thoughts: Favorite tweets

Tuesday, January 28, 2025

Favorite tweets

reading a deepseek paper and stumbled upon a very beautiful formula where they unify SFT and MOST RL TYPES (DPO, PPO, GRPO, etc.) into ONE FORMULA* *that requires additional reward functions to be defined. But the fundamental insight - that all these training methods can be… https://t.co/wSO6UHTySc https://t.co/1ZAFF7Sqz2
— N8 Programs (@N8Programs) Jan 28, 2025

from Twitter https://twitter.com/N8Programs

January 28, 2025 at 05:24AM
via IFTTT

No comments:

Post a Comment

Subscribe to: Post Comments (Atom)