Thoughts: Favorite tweets

Saturday, July 19, 2025

Favorite tweets

From GPT to MoE: I reviewed & compared the main LLMs of 2025 in terms of their architectural design from DeepSeek-V3 to Kimi 2. Multi-head Latent Attention, sliding window attention, new Post- & Pre-Norm placements, NoPE, shared-expert MoEs, and more... https://t.co/oEt8XzNxik
— Sebastian Raschka (@rasbt) Jul 19, 2025

from Twitter https://twitter.com/rasbt

July 19, 2025 at 12:36PM
via IFTTT

No comments:

Post a Comment

Subscribe to: Post Comments (Atom)