Thoughts: Favorite tweets

Friday, January 31, 2025

Favorite tweets

I read up on DeepSeek’s learning algo, GRPO. GRPO: group relative policy optimization How GRPO works: 1 • model generates a group of answers 2 • compute score for each answer 3 • compute avg score for entire group 4 • compare each answer score to avg score 5 • reinforce… https://t.co/VRxqSVg3ET https://t.co/kbCI7vrmjd
— virat (@virattt) Jan 30, 2025

from Twitter https://twitter.com/virattt

January 30, 2025 at 11:05PM
via IFTTT

No comments:

Post a Comment

Subscribe to: Post Comments (Atom)