The Beijing Academy of Artificial Intelligence (BAAI) has released a new family of multimodal models named Bunny, which includes the Bunny-3B model. This model, built upon SigLIP and Phi-2 technologies, is notable for its lightweight design yet powerful performance, achieving results on par with models that have a 13 billion parameter count. The Bunny-3B model is now available on the Hugging Face platform. In related developments, Amazon has presented a paper on the Question Aware Vision Transformer for Multimodal Reasoning, highlighting the growing research focus on vision-language models. Additionally, a new multimodal benchmark has been introduced, which could be useful for AI art, visual language modeling, multimodal retrieval, and interpretability studies.
Bunny🐰is on the Hub! A family of lightweight but powerful multimodal models released by Chinese research lab @BAAIBeijing 🔥 https://t.co/ZcX1I5UNGE ✨Bunny-3B (SigLIP + Phi-2) outperforms even 13B models!
Bunny🐰is on the Hub! A family of powerful vision & language models in a tiny footprint released by Chinese research lab @BAAIBeijing 🔥 https://t.co/ZcX1I5UNGE ✨Bunny-3B (SigLIP + Phi-2) outperforms even 13B models!
Bunny🐰is on the Hub! A family of powerful vision & language models in a tiny footprint released by Chinese research Lab @BAAIBeijing 🔥 https://t.co/ZcX1I5UNGE ✨Bunny-3B (SigLIP + Phi-2) outperforms even 13B models!
Amazon presents Question Aware Vision Transformer for Multimodal Reasoning paper page: https://t.co/FZTlpkWUyJ Vision-Language (VL) models have gained significant research focus, enabling remarkable advances in multimodal reasoning. These architectures typically comprise a… https://t.co/i1blmKRQAL
📢 New multimodal benchmark and results on a simple (yet not trivial!) task. We hope that this dataset will be useful in AI art, visual language modeling, multimodal retrieval, and possibly even mechanistic interpretability. https://t.co/0YVrArla4n
Welcome Bunny! A family of lightweight but powerful multimodal models from @BAAIBeijing With detailed work on dataset curation, the Bunny-3B model built upon SigLIP and Phi-2 achieves performance on par with 13B models. Model on @huggingface: https://t.co/w47ClwaJED https://t.co/WhQKMf9Tk4