Thoughts: Favorite tweets

Thursday, December 26, 2024

Favorite tweets

Some notes on the DeepSeek-V3 Technical Report :) The most insane thing to me: The whole training only cost $5.576 million or ~55 days on a 2048xH800 cluster. This is TINY compared to the Llama, GPT or Claude training runs. - 671B MoE with 37B activate params - DeepSeek MoE… https://t.co/8SkhYsjf6H https://t.co/kmDDHcIN3X
— Lisan al Gaib (@scaling01) Dec 26, 2024

from Twitter https://twitter.com/scaling01

December 26, 2024 at 01:42PM
via IFTTT

No comments:

Post a Comment