Efficient Memory Management for Large Language Model Serving with PagedAttention Improves the throughput of popular LLMs by 2-4x with the same level of latency compared to the SotA systems. repo: https://t.co/wJwTJyG1vh abs: https://t.co/PfWAjvX2zn https://t.co/aBIOuC8rLY
— Aran Komatsuzaki (@arankomatsuzaki) Sep 13, 2023
from Twitter https://twitter.com/arankomatsuzaki
September 12, 2023 at 05:55PM
via IFTTT
No comments:
Post a Comment