Is Online AI feedback (OAIF) the next iteration of DPO/RLHF? OAIF utilizes an LLM to provide online feedback, demonstrating superior performance over offline DPO and RLHF methods through human evaluation. 🤔 Online, in RLHF/RLAIF refers to the training data being acquired… https://t.co/a6uCgSL2kb https://t.co/L0Smp6sCbE
— Philipp Schmid (@_philschmid) Feb 9, 2024
from Twitter https://twitter.com/_philschmid
February 09, 2024 at 03:06PM
via IFTTT
No comments:
Post a Comment