Meta has introduced Self-Rewarding Language Models, with a 70B fine-tuned Llama 2 model outperforming existing systems on the AlpacaEval 2.0 leaderboard, including Claude 2, Gemini Pro, and GPT-4 0613. This approach achieved GPT-4 and Mistral Medium performance on AlpacaEval2.0. The model builds upon previous work in Self-Instruct, Constitutional AI, and DPO to self-improve an instruction following model. The paper also discusses the potential of extending LLaMA models beyond the English language.
🌟 LLaMA Beyond English: An Empirical Study on Language Capability Transfer 🌍 With the increasing popularity of non-English specific Large Language Models (LLMs), this paper gives interesting insights for extending LLaMA models beyond the English language. 🔍 The usual process… https://t.co/qEjCeo0E6Y
New work from Meta: Self-Rewarding Language Models https://t.co/Ob6KJj9DqN Builds upon the work in Self-Instruct (automatic prompt/instruction generation), Constitutional AI (AI feedback) and DPO (eschews explicit reward models) to self-improve an instruction following model.
New LLM paper from @meta's FAIR and NYU shows how to use a base model (Llama 2 70B) to iteratively train itself and continuously improve performance. This approach was able to reach GPT-4 and Mistral Medium performance on AlpacaEval2.0. Really unintuitive that this closed loop… https://t.co/Qr7Wzr1TbT
Self-Rewarding Language Models - Meta 2024 https://t.co/K5yWy9vxuP
Meta presents Self-Rewarding Language Models and beats GPT-4! “A fine-tuning Llama-2 70B on three iterations of our approach yields a model that outperforms many existing systems on the AlpacaEval 2.0, including Claude 2, Gemini Pro, and GPT-4 0613” https://t.co/ZlUrEkplrH
Let me say it louder for those in the back: FINE-TUNING LLAMA 2 70B ON THREE ITERATIONS OF OUR APPROACH YIELDS A MODEL THAT OUTPERFORMS MANY EXISTING SYSTEMS ON THE ALPACAEVAL 2.0 LEADERBOARD, INCLUDING CLAUDE 2, GEMINI PRO, AND GPT-4 0613 https://t.co/TYBc7xW138
Meta presents Self-Rewarding Language Models paper page: https://t.co/ZAh4ZotyCL Fine-tuning Llama 2 70B on three iterations of our approach yields a model that outperforms many existing systems on the AlpacaEval 2.0 leaderboard, including Claude 2, Gemini Pro, and GPT-4 0613 https://t.co/hdYd6jSAD1