Introducing DeepSpeed-FastGen 🚀 Serve LLMs and generative AI models with - 2.3x higher throughput - 2x lower average latency - 4x lower tail latency w. Dynamic SplitFuse batching Auto TP, load balancing w. perfect linear scaling, plus easy-to-use API https://t.co/iizM71bjqj https://t.co/x2mDwzBJK7
— DeepSpeed (@MSFTDeepSpeed) Nov 3, 2023
from Twitter https://twitter.com/MSFTDeepSpeed
November 03, 2023 at 04:51PM
via IFTTT
No comments:
Post a Comment