OpenAI introduces HealthBench, a new open-source LLM benchmark for health! Across frontier models, o3 is the best performing model with a score of 60%, followed by Grok 3 (54%) and Gemini 2.5 Pro (52%) A deeper dive: HealthBench consists of 5,000 synthetically generated https://t.co/rcQaEj3deI
— Tanishq Mathew Abraham, Ph.D. (@iScienceLuvr) May 12, 2025
from Twitter https://twitter.com/iScienceLuvr
May 12, 2025 at 07:39PM
via IFTTT
No comments:
Post a Comment