DeepSpeed Inference: Enabling Efficient Inference of Transformer Models at Unprecedented Scale Reduces latency by up to 7.3x over the SotA for latency oriented scenarios and increases throughput by over 1.5x for throughput oriented scenarios. https://t.co/U7Il1KUkPl https://t.co/3wuP7pmJuX
— Aran Komatsuzaki (@arankomatsuzaki) Jul 4, 2022
from Twitter https://twitter.com/arankomatsuzaki
July 03, 2022 at 05:41PM
via IFTTT
No comments:
Post a Comment