Some notes on the DeepSeek-V3 Technical Report :) The most insane thing to me: The whole training only cost $5.576 million or ~55 days on a 2048xH800 cluster. This is TINY compared to the Llama, GPT or Claude training runs. - 671B MoE with 37B activate params - DeepSeek MoE… https://t.co/8SkhYsjf6H https://t.co/kmDDHcIN3X
— Lisan al Gaib (@scaling01) Dec 26, 2024
from Twitter https://twitter.com/scaling01
December 26, 2024 at 01:42PM
via IFTTT
No comments:
Post a Comment