Thoughts: Favorite tweets

Saturday, November 30, 2024

Favorite tweets

wow lot of miss conception @karpathy sensai replies ppl have, teaching policy to an agent is expensive and inefficient, so we came up with a "hack" of the reward model, which acts on humans behalf, so we only need 1000 samples and RM can extrapolate 1e8 something episodes https://t.co/YrJWyCxWOm https://t.co/KKTLAU7BHa
— Joey (e/λ) (@shxf0072) Nov 30, 2024

from Twitter https://twitter.com/shxf0072

November 30, 2024 at 12:30PM
via IFTTT

No comments:

Post a Comment

Subscribe to: Post Comments (Atom)