Tool-using LLMs can learn to reason—without reasoning traces. 🔥 We present Nemotron-Research-Tool-N1, a family of tool-using reasoning LLMs trained entirely via rule-based reinforcement learning—no reasoning supervision, no distillation. 📄 Paper: https://t.co/QGE4QVxXYX 💻 https://t.co/cEl5GyTT1B
— Shaokun Zhang (@ShaokunZhang1) May 13, 2025
from Twitter https://twitter.com/ShaokunZhang1
May 13, 2025 at 01:44AM
via IFTTT
No comments:
Post a Comment