I read up on DeepSeek’s learning algo, GRPO. GRPO: group relative policy optimization How GRPO works: 1 • model generates a group of answers 2 • compute score for each answer 3 • compute avg score for entire group 4 • compare each answer score to avg score 5 • reinforce… https://t.co/VRxqSVg3ET https://t.co/kbCI7vrmjd
— virat (@virattt) Jan 30, 2025
from Twitter https://twitter.com/virattt
January 30, 2025 at 11:05PM
via IFTTT
No comments:
Post a Comment