Combining the benefits of RL and SFT with on-policy distillation, a promising approach for training small models for domain performance and continual learning. https://t.co/u5tcvw1BG5
— Mira Murati (@miramurati) Oct 27, 2025
from Twitter https://twitter.com/miramurati
October 27, 2025 at 05:06PM
via IFTTT
No comments:
Post a Comment