Thoughts: Favorite tweets

Friday, February 9, 2024

Favorite tweets

Is Online AI feedback (OAIF) the next iteration of DPO/RLHF? OAIF utilizes an LLM to provide online feedback, demonstrating superior performance over offline DPO and RLHF methods through human evaluation. 🤔 Online, in RLHF/RLAIF refers to the training data being acquired… https://t.co/a6uCgSL2kb https://t.co/L0Smp6sCbE
— Philipp Schmid (@_philschmid) Feb 9, 2024

from Twitter https://twitter.com/_philschmid

February 09, 2024 at 03:06PM
via IFTTT

No comments:

Post a Comment

Subscribe to: Post Comments (Atom)