It's an open question how much you can eventually replace existing LLM text workflows with multi-modal vision workflows Previously: Parse a web page (e.g. bs4) -> index text -> dump into LLM context Now: Screenshot a web page -> index images -> pass to GPT-4V (??)
— Jerry Liu (@jerryjliu0) Nov 9, 2023
from Twitter https://twitter.com/jerryjliu0
November 09, 2023 at 02:25AM
via IFTTT
No comments:
Post a Comment