Holy shit they’re doing on-policy RL by just deploying the model to prod lmao that’s so baller. also 2 hrs for a training step makes our 10 minute steps feel lightning fast @hamishivi … they probably have a bigger batch size though 😅 https://t.co/LSAV5OgVXU https://t.co/zjjDCysK7z
— Saurabh Shah (@saurabh_shah2) Sep 12, 2025
from Twitter https://twitter.com/saurabh_shah2
September 12, 2025 at 02:25AM
via IFTTT
No comments:
Post a Comment