Wow, someone just released a notebook to train a reasoning LLM with the new RL algorithm from DeepSeek, GRPO. In <2 hours, you can transform a very small model, Qwen 0.5 (500 million parameters) into a tiny math reasoning machine. https://t.co/Su0cJ6kw9H
— Lior⚡ (@LiorOnAI) Feb 4, 2025
from Twitter https://twitter.com/LiorOnAI
February 04, 2025 at 06:54PM
via IFTTT
No comments:
Post a Comment