LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale Develops a procedure for Int8 matmul for feed-forward and attention projection layers in transformers, which halves the memory for inference while retaining fp32 performance. https://t.co/D5qgJiycI7 https://t.co/6YIWqwPXkN
— Aran Komatsuzaki (@arankomatsuzaki) Aug 16, 2022
from Twitter https://twitter.com/arankomatsuzaki
August 15, 2022 at 05:50PM
via IFTTT
 
No comments:
Post a Comment