New RLHF 7B model: * Trained with RLAIF dataset (many sources of prompts + many models) https://t.co/wSu8l7SmHB * Released RM (fine-tune from Llama-2-chat) https://t.co/Hxvw5srIUh * Trained with different policy optimizer (APA) https://t.co/yqqMFidBKX * SOTA on MT Bench 7b (8.01)… https://t.co/MHE9OzmvDu https://t.co/b93x7W2iRM
— Nathan Lambert (@natolambert) Nov 27, 2023
from Twitter https://twitter.com/natolambert
November 27, 2023 at 04:44PM
via IFTTT
No comments:
Post a Comment