doesn't speculative decoding basically solve this problem already? (I think we can already run models 3-4x faster in practice by using a draft model and doing speculative decoding) https://t.co/vR0dbEPhtA
— jack morris (@jxmnop) Apr 4, 2024
from Twitter https://twitter.com/jxmnop
April 04, 2024 at 02:03PM
via IFTTT
No comments:
Post a Comment