From GPT to MoE: I reviewed & compared the main LLMs of 2025 in terms of their architectural design from DeepSeek-V3 to Kimi 2. Multi-head Latent Attention, sliding window attention, new Post- & Pre-Norm placements, NoPE, shared-expert MoEs, and more... https://t.co/oEt8XzNxik
— Sebastian Raschka (@rasbt) Jul 19, 2025
from Twitter https://twitter.com/rasbt
July 19, 2025 at 12:36PM
via IFTTT
No comments:
Post a Comment