Researchers are introducing new algorithms called DiscoPOP and ORPO to streamline AI training, reduce data needs for fine-tuning, mitigate bias in language models, and discover preference optimization algorithms using Large Language Models (LLMs). These algorithms aim to improve learning from preference feedback and transform Large Language Model outputs.
Self-Play Preference Optimization for Language Model Alignment. https://t.co/AdpTIP40r6
[LG] Deep Bayesian Active Learning for Preference Modeling in Large Language Models L C. Melo, P Tigas, A Abate, Y Gal [University of Oxford] (2024) https://t.co/ShMq3Q2NMP - Preference modeling with human feedback is key for aligning LLMs, but data selection is challenging and… https://t.co/NW99kCLGBr
Unpacking DPO and PPO: Disentangling Best Practices for Learning from Preference Feedback ◼ 🚀 New research dissects preference-based learning in language models, revealing that quality of preference data significantly boosts performance. Key insight: PPO outshines DPO, with up… https://t.co/CeN1hGXLzv
How well do DPO and PPO work on public preference datasets? Excited to share some work exploring the effects of data, reward models, and prompts! We also find that PPO generally beats DPO, despite being more challenging engineering-wise. 📜: https://t.co/niXEHuPK1S More below 👇 https://t.co/oxuta479tm
DPO is out DiscoPOP is in 🔥 This paper proposes to take the human equations out from DPO with DiscoPOP Discovered Preference Optimization 🤯 ✨ All existing SOTA Preference Optimization algorithms have been developed by human experts. These solutions are inherently… https://t.co/CTNm9yux7D
[CL] Unpacking DPO and PPO: Disentangling Best Practices for Learning from Preference Feedback H Ivison, Y Wang, J Liu, Z Wu… [Allen Institute for AI & University of Washington] (2024) https://t.co/Fq3Iw1qUos - Preference-based learning has been widely used to improve… https://t.co/opixCOktQp
Discovering Preference Optimization Algorithms with and for Large Language Models ◼ 🚀 New research transforms Large Language Model outputs! DiscoPOP, a novel algorithm derived from LLM-driven objective discovery, eclipses traditional methods by merging logistic & exponential… https://t.co/RXXl5kUm0k
AI2 presents Unpacking DPO and PPO: Disentangling Best Practices for Learning from Preference Feedback https://t.co/mMEEGj04FR https://t.co/pdupxr7Mgb
[LG] Discovering Preference Optimization Algorithms with and for Large Language Models https://t.co/PStvT6KPWc - This work proposes using Large Language Models (LLMs) to automatically generate novel offline preference optimization algorithms through an iterative discovery… https://t.co/kG3HzK7NZR
Introducing DiscoPOP, the latest release from the team at @SakanaAILabs. This time, it’s a new SOTA preference optimisation algorithm that was discovered and written by an LLM 😮. The LLM-driven discovery process seems generalizable enough, but here it’s been used to create novel… https://t.co/nnCJm06h7A
🆕💡🎧 LLM Fine-tuning and Preference Alignment with Jiwoo Hong and Noah Lee @kaist_ai: - Streamline AI training with ORPO - Reduce data needs for fine-tuning - Mitigate bias in language models https://t.co/g8avYJJdj2