a banger article on RL and LLM reasoning was just released by @rasbt. hands-down THE best intro that covers: > reasoning models basics > RLHF basics and PPO explanation > how DeepSeek-R1 reasoning models are trained > top papers about reasoning models > lessons we learned so far https://t.co/S9V4uMfiJK
— ℏεsam (@Hesamation) Apr 19, 2025
from Twitter https://twitter.com/Hesamation
April 19, 2025 at 10:56PM
via IFTTT
No comments:
Post a Comment