Thoughts: Favorite tweets

Monday, May 12, 2025

Favorite tweets

Tool-using LLMs can learn to reason—without reasoning traces. 🔥 We present Nemotron-Research-Tool-N1, a family of tool-using reasoning LLMs trained entirely via rule-based reinforcement learning—no reasoning supervision, no distillation. 📄 Paper: https://t.co/QGE4QVxXYX 💻 https://t.co/cEl5GyTT1B
— Shaokun Zhang (@ShaokunZhang1) May 13, 2025

from Twitter https://twitter.com/ShaokunZhang1

May 13, 2025 at 01:44AM
via IFTTT

No comments:

Post a Comment

Subscribe to: Post Comments (Atom)