How to train a 670B parameter model. Let's talk about the DeepSeek v3 report + some comparisons with what Meta did with Llama 405B https://t.co/Hv8Q5Q37A3
— wh (@nrehiew_) Dec 26, 2024
from Twitter https://twitter.com/nrehiew_
December 26, 2024 at 04:26PM
via IFTTT
No comments:
Post a Comment