This is huge. They just allowed everyone to fine-tune LLMs with RL from your browser. You can even use GRPO, the RL method of Deepseek. A fine-tuned model "outperformed OpenAI o1 and DeepSeek-R1 with a dozen labeled data points." https://t.co/z6tgze2J6J
— Lior⚡ (@LiorOnAI) Mar 19, 2025
from Twitter https://twitter.com/LiorOnAI
March 19, 2025 at 08:16PM
via IFTTT
No comments:
Post a Comment