The DeepSeek V3 model file is ~450 lines of code in MLX LM. Includes pipeline-parallelism and all. Good way to see how it all works. https://t.co/Vq3y8QppVs
— Awni Hannun (@awnihannun) Jan 28, 2025
from Twitter https://twitter.com/awnihannun
January 28, 2025 at 03:01AM
via IFTTT
No comments:
Post a Comment