A new language model called Eagle-7B, based on the RWKV-v5 architecture, has been introduced. It outperforms Mistral-7B in multilingual tasks and excels in handling 100+ languages. The model, an attention-free transformer, is trained on 1.1 trillion tokens and has comparable English performance with the best 1T 7B models. Eagle-7B is licensed as Apache 2.0 and is available in open source. It uses RNNs instead of the transformer architecture, resulting in 10-100x lower inference cost, speed, and longer context. The RWKV architecture aims to balance computational efficiency and model performance in sequence processing tasks, combining aspects of both Transformers and RNNs.
RWKV-5 "Eagle" 7B is Mistral-7B level for language modeling of unseen arxiv CS & Physics papers, and significantly better than Llama2🐦We are testing more new data. https://t.co/Pm6i6vowUH https://t.co/DEHGgjwKbp
🔥 “Small” LLMs are the ones that have 1-2B parameters (instead of 7-200B). They are still trained with trillions of words. The idea is to push the envelope on “information compression” to develop models that can be much faster and much smaller for specialized use cases, such as… https://t.co/v1b4UFZTeJ
📌 The Receptance Weighted Key Value (RWKV, the architecture behind Eagle-7B) introduced by Peng et al. aims to reconcile the trade-off between computational efficiency and model performance in sequence processing tasks. 📌 RWKV combines aspects of both Transformers and RNNs… https://t.co/XOYvi3wDhK https://t.co/KncPEkfLmO
Big. An RNN-based LLM just outperformed transformers. Eagle-7B is a new attention-free LLM with 1 Trillion Tokens Across 100+ Languages. It's RWKV-v5 architecture uses RNNs instead of the transformer architecture allowing 10-100x lower inference cost, speed, and longer context… https://t.co/BL1UPsH4k3
Eagle 7B : Soaring past Transformers with 1 Trillion Tokens Across 100+ Languages A brand new era for the RWKV-v5 architecture and linear transformer's has arrived - with the strongest multi-lingual model in open source today https://t.co/ByYjec1VhM
RWKV-v5 Eagle 7B is out 🔥 ✨ Trained on 1.1 Trillion Tokens across 100+ languages 📄 Apache 2.0 🚀 Outperforms all 7B class models Model: https://t.co/3MREbmukgj Demo: https://t.co/u0WJhAprwv 💡Check the blog, their response to the question about the muiti-lingual is really cool… https://t.co/zwrWHW3cIn
🦅 Eagle 7B: RWKV (RNNs) Soaring past Transformers with 1 Trillion Tokens Across 100+ Languages 🚀Outperforms all 7B class models in multi-lingual benchmarks. Perhaps, a good dataset + Scalable architecture: is all you need?🤓 👏Licensed as Apache 2.0 license. Demo on Spaces! https://t.co/AcTfRDcsUk
a new RWKV (non-transformer-architecture) LLM called Eagle 7B has just been released. this model stands out by being competitive with Mistral 7B models and excels, in particular, at handling multilingual tasks. model and demo links 👇 https://t.co/MnlcEmgmYs
Introducing Eagle-7B Based on the RWKV-v5 architecture, bringing into opensource space, the strongest - multi-lingual model (beating even mistral) - attention-free transformer today (10-100x+ lower inference) With comparable English performance with the best 1T 7B models https://t.co/hWtEMC1264
RWKV-5 "Eagle" 7B: beats Mistral-7B at multilingual, reaches Llama2-7B level at English, while being 100% attention-free RNN and only trained 1.1T tokens. Gradio Demo: https://t.co/k0AivnxCwP RWKV-6 "Finch" 1B5 in ~10days, 3B in ~30days. https://t.co/c6dByjF976