Three days ago I left autoresearch tuning nanochat for ~2 days on depth=12 model. It found ~20 changes that improved the validation loss. I tested these changes yesterday and all of them were additive and transferred to larger (depth=24) models. Stacking up all of these changes, https://t.co/j34dSt4oht
— Andrej Karpathy (@karpathy) Mar 9, 2026
from Twitter https://twitter.com/karpathy
March 9, 2026 at 03:28PM
via IFTTT
No comments:
Post a Comment