I am once again shocked at how much better image retrieval performance you can get if you embed highly opinionated summaries of an image, a summary that came out of a visual language model, than using CLIP embeddings themselves. If you tell the LLM that the summary is going to be
— jason liu (@jxnlco) Sep 5, 2025
from Twitter https://twitter.com/jxnlco
September 05, 2025 at 07:36PM
via IFTTT
No comments:
Post a Comment