The transformer architecture of GPT upper bounds its ability at memorization. It cannot learn many algorithms due to the functional form of its forward pass, and spends a fixed compute per token - i.e. it can't "think for a while". Progress here critical, likely but non-trivial. pic.twitter.com/ULJdf35MJU
— Andrej Karpathy (@karpathy) July 19, 2020
from Twitter https://twitter.com/karpathy
July 19, 2020 at 12:10PM
via IFTTT
No comments:
Post a Comment