Thoughts: Favorite tweets

Tuesday, January 9, 2024

Favorite tweets

3,072 MI250X used to train 22 billion, 175 billion, and 1 trillion parameter LLM They showed tons, from optimizers to distributed parallelism strategies to hyperparameter tuning + search 32% to 38% flops utilization which is okay, but much lower than A100 https://t.co/kjCRUNdqZb https://t.co/zTz4Fq1cW7
— Dylan Patel (@dylan522p) Jan 9, 2024

from Twitter https://twitter.com/dylan522p

January 09, 2024 at 04:46PM
via IFTTT

No comments:

Post a Comment

Subscribe to: Post Comments (Atom)