Thoughts: Favorite tweets

Thursday, November 7, 2024

Favorite tweets

Turns out LLMs are paying too much attention. LLMs can run twice as fast by removing redundant attention layers that don't impact performance materially. Removing half the attention layers in models barely affects accuracy but doubles speed. Original Problem 🔍: LLMs exhibit… https://t.co/rCHNuImJkp https://t.co/fSeulfDzDi
— Rohan Paul (@rohanpaul_ai) Nov 7, 2024

from Twitter https://twitter.com/rohanpaul_ai

November 07, 2024 at 01:18PM
via IFTTT

No comments:

Post a Comment

Subscribe to: Post Comments (Atom)