Thoughts: Favorite tweets

Tuesday, January 28, 2025

Favorite tweets

I read the DeepSeek-R1 paper the day it came out, and I don’t think GRPO is the key to its success. Instead, here’s what truly matters (ranked by importance): 1. Iterative RL and SFT 2. A hybrid reward model—mixing rule-based RM and neural RM for deterministic tasks 3.… https://t.co/navYBL6WhC
— Jiao Sun (@sunjiao123sun_) Jan 28, 2025

from Twitter https://twitter.com/sunjiao123sun_

January 28, 2025 at 01:03AM
via IFTTT

No comments:

Post a Comment

Subscribe to: Post Comments (Atom)