Lots of hot takes on whether it's possible that DeepSeek made training 45x more efficient, but @doodlestein wrote a very clear explanation of how they did it. Once someone breaks it down, it's not hard to understand. Rough summary: * Use 8 bit instead of 32 bit floating point… https://t.co/svp5bfXpbd
— Jared Friedman (@snowmaker) Jan 26, 2025
from Twitter https://twitter.com/snowmaker
January 26, 2025 at 09:31PM
via IFTTT
No comments:
Post a Comment