Amos is a new optimizer that we propose to pre-train large language models. It is more efficient and converges faster than AdamW: ≤ 51% memory for slot variables, and better valid loss within ≤ 70% training time! New from Google Research. Preprint: https://t.co/HjNAZQsyM9 https://t.co/Q1MCjB9jay
— Ran TIAN (@Robin_Tian) Oct 26, 2022
from Twitter https://twitter.com/Robin_Tian
October 26, 2022 at 09:43AM
via IFTTT
No comments:
Post a Comment