Tokenization is all the rage today (cc @karpathy's minbpe). In Tokenization and the Noiseless Channel we hypothesized that the higher the unigram Rényi entropy the better the tokenization and thus model performance. (entropy is a much better metric than compression length) https://t.co/02QIHKOxPU
— Vilém Zouhar (@zouharvi) Feb 23, 2024
from Twitter https://twitter.com/zouharvi
February 23, 2024 at 12:32PM
via IFTTT
No comments:
Post a Comment