Fine-tune a Mistral-7b model using Direct Preference Optimization (DPO). Just published a tutorial on @TDataScience about using DPO to enhance the performance of SFT models. Funnily enough, I created NeuralHermes-2.5 for this article. https://t.co/XyrcXOZ0Ed
— Maxime Labonne (@maximelabonne) Jan 2, 2024
from Twitter https://twitter.com/maximelabonne
January 02, 2024 at 10:18AM
via IFTTT
No comments:
Post a Comment