MoBA, Mixture of Block Attention, from @Kimi_Moonshot improves handling long-context tasks with no fixed attention patterns. Applying ideas from Mixture of Experts (MoE) to attention, MoBA lets the model dynamically decide where to focus. This allows MoBA to be 6.5x faster than… https://t.co/gkeXEz5uZa https://t.co/TnkgatoXaP
— TuringPost (@TheTuringPost) Feb 26, 2025
from Twitter https://twitter.com/TheTuringPost
February 26, 2025 at 11:31AM
via IFTTT
No comments:
Post a Comment