GRPO (Group Relative Policy Optimization) - the core algorithm behind deepseek r1 - explained : https://t.co/qgRkPWWoTJ — λux (@novasarc01) Jan 25, 2025
GRPO (Group Relative Policy Optimization) - the core algorithm behind deepseek r1 - explained : https://t.co/qgRkPWWoTJ
No comments:
Post a Comment