The next generation of models seem to mostly target infinite context and adaptive compute per token. Basically, these two papers: Google: Mixture of Depths Google: Infini-Attention
— Casper Hansen (@casper_hansen_) Apr 17, 2024
from Twitter https://twitter.com/casper_hansen_
April 17, 2024 at 05:00PM
via IFTTT
No comments:
Post a Comment