Thoughts: Favorite tweets

Monday, September 2, 2024

Favorite tweets

You can now distill pretrained Transformers to Mamba / hybrid architecture to get really strong models with fast inference in just a few billion tokens. Beautiful math as always https://t.co/ZQiazasmwg
— Tri Dao (@tri_dao) Aug 22, 2024

from Twitter https://twitter.com/tri_dao

August 22, 2024 at 08:06PM
via IFTTT

No comments:

Post a Comment

Subscribe to: Post Comments (Atom)