3,072 MI250X used to train 22 billion, 175 billion, and 1 trillion parameter LLM They showed tons, from optimizers to distributed parallelism strategies to hyperparameter tuning + search 32% to 38% flops utilization which is okay, but much lower than A100 https://t.co/kjCRUNdqZb https://t.co/zTz4Fq1cW7
— Dylan Patel (@dylan522p) Jan 9, 2024
from Twitter https://twitter.com/dylan522p
January 09, 2024 at 04:46PM
via IFTTT
No comments:
Post a Comment