New Algorithms DiscoPOP and ORPO Enhance AI Training a

Self-Play Preference Optimization for Language Model Alignment. https://t.co/AdpTIP40r6

[LG] Deep Bayesian Active Learning for Preference Modeling in Large Language Models L C. Melo, P Tigas, A Abate, Y Gal [University of Oxford] (2024) https://t.co/ShMq3Q2NMP - Preference modeling with human feedback is key for aligning LLMs, but data selection is challenging and… https://t.co/NW99kCLGBr

nat://TheAIObserverX@TheAIObserverX

16 d

Unpacking DPO and PPO: Disentangling Best Practices for Learning from Preference Feedback ◼ 🚀 New research dissects preference-based learning in language models, revealing that quality of preference data significantly boosts performance. Key insight: PPO outshines DPO, with up… https://t.co/CeN1hGXLzv

Hamish Ivison@hamishivi

16 d

How well do DPO and PPO work on public preference datasets? Excited to share some work exploring the effects of data, reward models, and prompts! We also find that PPO generally beats DPO, despite being more challenging engineering-wise. 📜: https://t.co/niXEHuPK1S More below 👇 https://t.co/oxuta479tm

Rohan Paul@rohanpaul_ai

16 d

DPO is out DiscoPOP is in 🔥 This paper proposes to take the human equations out from DPO with DiscoPOP Discovered Preference Optimization 🤯 ✨ All existing SOTA Preference Optimization algorithms have been developed by human experts. These solutions are inherently… https://t.co/CTNm9yux7D

fly51fly@fly51fly

17 d

[CL] Unpacking DPO and PPO: Disentangling Best Practices for Learning from Preference Feedback H Ivison, Y Wang, J Liu, Z Wu… [Allen Institute for AI & University of Washington] (2024) https://t.co/Fq3Iw1qUos - Preference-based learning has been widely used to improve… https://t.co/opixCOktQp

nat://TheAIObserverX@TheAIObserverX

17 d

Discovering Preference Optimization Algorithms with and for Large Language Models ◼ 🚀 New research transforms Large Language Model outputs! DiscoPOP, a novel algorithm derived from LLM-driven objective discovery, eclipses traditional methods by merging logistic & exponential… https://t.co/RXXl5kUm0k

Aran Komatsuzaki@arankomatsuzaki

17 d

AI2 presents Unpacking DPO and PPO: Disentangling Best Practices for Learning from Preference Feedback https://t.co/mMEEGj04FR https://t.co/pdupxr7Mgb

fly51fly@fly51fly

17 d

[LG] Discovering Preference Optimization Algorithms with and for Large Language Models https://t.co/PStvT6KPWc - This work proposes using Large Language Models (LLMs) to automatically generate novel offline preference optimization algorithms through an iterative discovery… https://t.co/kG3HzK7NZR

Tony Wang@TonyW

17 d

Introducing DiscoPOP, the latest release from the team at @SakanaAILabs. This time, it’s a new SOTA preference optimisation algorithm that was discovered and written by an LLM 😮. The LLM-driven discovery process seems generalizable enough, but here it’s been used to create novel… https://t.co/nnCJm06h7A

Ben Lorica 罗瑞卡@bigdata

17 d

🆕💡🎧 LLM Fine-tuning and Preference Alignment with Jiwoo Hong and Noah Lee @kaist_ai: - Streamline AI training with ORPO - Reduce data needs for fine-tuning - Mitigate bias in language models https://t.co/g8avYJJdj2

Similar Stories

New Algorithms DiscoPOP and ORPO Enhance AI Training and Preference Optimization with LLMs

Similar Stories

Sources

New Algorithms DiscoPOP and ORPO Enhance AI Training and Preference Optimization with LLMs