We are currently working on annotating this, but I felt it was too beautiful not to share—even unlabeled. This is a scatterplot of 1% of the dataset @BigscienceW 🌸BLOOM🌸 was trained on; ~6M documents colored by language. Can you guess the languages? https://t.co/2LXywRfdsQ
— Christopher Akiki (@christopher) Jul 19, 2022
from Twitter https://twitter.com/christopher
July 19, 2022 at 11:52AM
via IFTTT
No comments:
Post a Comment