Thoughts: Favorite tweets

Wednesday, October 15, 2025

Favorite tweets

Holy shit... Tencent researchers just killed fine-tuning AND reinforcement learning in one shot 😳 They call it Training-Free GRPO (Group Relative Policy Optimization). Instead of updating weights, the model literally learns from 'its own experiences' like an evolving memory https://t.co/58Itm9bndN
— Robert Youssef (@rryssf_) Oct 15, 2025

from Twitter https://twitter.com/rryssf_

October 15, 2025 at 10:25AM
via IFTTT

No comments:

Post a Comment

Subscribe to: Post Comments (Atom)