Loading...
The H4 team at Hugging Face has developed Zephyr, a new language model (LLM) that can distill alignment from larger models into smaller ones. Zephyr-7b-beta, the latest version, has surpassed all 7B models in chat evaluations and even outperformed 10x larger models. The model focuses on efficiency while maintaining high performance, enabling real-world applications. The technique used, called Direct Preference Optimization (dDPO), has shown promising results in improving smaller LLMs. Zephyr has been released on the Ollama library and has been tested on RAG/agent tasks, proving its capability to handle ReAct agent tasks.
Zephyr: Direct Distillation of LM Alignment Tunstall et al.: https://t.co/qLKildWUwJ #ArtificialIntelligence #DeepLearning #MachineLearning https://t.co/ht4QcZyhAW
Distilling Step-by-Step – is a technique for fine-tuning smaller language models: https://t.co/z7XCrurLsB It requires less training data than standard fine-tuning and results in smaller models that can outperform few-shot prompted #LLMs that have 700x the parameters. #InfoQ #AI https://t.co/tPrjGtVqWy
7B LLMs are getting better and better. Zephyr-7b-beta (@huggingface) was just released. We ran it on @llama_index RAG/agent tasks and found it’s the only 7B LLM that can handle ReAct agent tasks over data 💫 Guide + benchmark 👇: https://t.co/6EgxibbtEY Full credits @Haotianzh https://t.co/g4vNMEKZyb
Zephyr 7B Beta, a new Mistral fine-tune, is out!🦙 https://t.co/MwTPUcFGsZ https://t.co/cvlrzUtWX0
We've updated @huggingface's Zephyr model on Ollama library to Zephyr beta (alpha is still available). Simply run: ollama run zephyr https://t.co/HnpIEJpBdJ
[LG] Zephyr: Direct Distillation of LM Alignment L Tunstall, E Beeching, N Lambert, N Rajani, K Rasul, Y Belkada, S Huang, L v Werra, C Fourrier, N Habib, N Sarrazin, O Sanseviero, A M. Rush, T Wolf [HuggingFace H4 Team] (2023) https://t.co/mwVmsKVpzV - The paper aims to train a… https://t.co/Vo2S8RwjOI https://t.co/8UOV8ibmpF
Zephyr: Direct Distillation of LM Alignment paper page: https://t.co/Pei8TAhsZv We aim to produce a smaller language model that is aligned to user intent. Previous research has shown that applying distilled supervised fine-tuning (dSFT) on larger models significantly improves… https://t.co/u2Um67We3M https://t.co/yNB6BhKZeT
The Zephyr-beta model from @huggingface H4 (led by @_lewtun and @edwardbeeching these days) is a great example of engineering practices and know how slowly kicking into gear for RLHF. Some takeaways beyond "high MT Bench and AlpacaEval scores": * DPO can work great for smaller… https://t.co/tPQPrv3OhV https://t.co/w8L2GqvxIw
Mistral 7B is one of the most powerful 7B LLMs out there! This model focuses on efficiency while maintaining high performance to enable applications in the real world, enabled with attention mechanisms like grouped-query attention and sliding window attention. To learn more… https://t.co/M2ZyYA84Ik https://t.co/Hq3zitlsDl
Mistral 7B is one of the most powerful 7B LLMs out there! This model focuses on efficiency while maintaining high performance to enable applications in the real world, enabled with attention mechanisms like grouped-query attention and sliding window attention. To learn more… https://t.co/MyGx68rUzP https://t.co/lDcegf5BFH
Over the past weeks the H4 team has been busy pushing the Zephyr 7B model to new heights 🗻 The new version is now topping all 7b models on chat evals and even 10x larger models 🤯🔥 Here are the intuitions on it 1/ Start with the strongest pretrained model you can find:… https://t.co/SjZNMgS5Kc https://t.co/fiv5gIPYFJ
Excited to release Zephyr-7b-beta 🪁 ! It pushes our recipe to new heights & tops 10x larger models 💪 📝 Technical report: https://t.co/3R4czrpbu5 🤗Model: https://t.co/8uUkvg4E7j ⚔️Evaluate it against 10+ LLMs in the @lmsysorg arena: https://t.co/2cMZRUvhOc Details in the 🧵 https://t.co/y7mp6A9OTl
What if we could distill the Alignment from models, like @OpenAI GPT-4 and @AnthropicAI Claude 2, into smaller models? 🤔 The @huggingface H4 team explored this idea and developed distilled Direct Preference Optimization (dDPO) 🧐 🧶 https://t.co/3DPBU2FqWC
Zephyr: Direct Distillation of LM Alignment abs: https://t.co/CoY83OO7VJ This paper from @huggingface introduces a recipe to distill alignment from small-scale (7B) LLMs with AI feedback. SOTA on open-source 7b models and comparable to Llama2-70b-chat. https://t.co/FaYUAMbXlS
Zephyr🪁: Distilling LM Alignment (https://t.co/9dcjdrlGgb) A simple recipe for a 7B parameter model that's competitive with 70B-RLHF models. Chat: https://t.co/xYabwfB64h https://t.co/38JGmj54vv