we missed a banger paper in the grok4/k2 drop noise guys. these guys > look for optimal ways to select data mixes to get max improvement on a model given a target domain. > do multimodal validation > show good extrapolation accuracy (testing on 1.4B and predicting on 8B) https://t.co/6EPjxlRAZJ
— tokenbender (@tokenbender) Jul 17, 2025
from Twitter https://twitter.com/tokenbender
July 17, 2025 at 05:27AM
via IFTTT
No comments:
Post a Comment