time for a paper and code review! this work claims to be both a new framework and a new benchmark for AI research agents! for such notable claims, it's worth digging into a bit, starting with Table 1: the columns here are interesting. it seems like "Agentic Harness" was added… https://t.co/vuJ9Ep9jir https://t.co/OY1KbZtSkI https://t.co/zQSdwsUMM0
— Susan Zhang (@suchenzang) Feb 23, 2025
from Twitter https://twitter.com/suchenzang
February 23, 2025 at 12:53PM
via IFTTT
No comments:
Post a Comment