Holy shit... Tencent researchers just killed fine-tuning AND reinforcement learning in one shot 😳 They call it Training-Free GRPO (Group Relative Policy Optimization). Instead of updating weights, the model literally learns from 'its own experiences' like an evolving memory https://t.co/58Itm9bndN
— Robert Youssef (@rryssf_) Oct 15, 2025
from Twitter https://twitter.com/rryssf_
October 15, 2025 at 10:25AM
via IFTTT
No comments:
Post a Comment