FYI to anyone using @MistralAI's Mixtral for long context tasks -- you can get even better performance by disabling sliding window attention (setting it to your max context length) config.sliding_window = 32768 https://t.co/K53Mwfc519
— emozilla (@theemozilla) Dec 14, 2023
from Twitter https://twitter.com/theemozilla
December 14, 2023 at 05:28PM
via IFTTT
No comments:
Post a Comment