LLaVA-1.6, an open-source multimodal model, has been released with improvements in reasoning, OCR, and world knowledge. It outperforms Gemini Pro on several benchmarks, supports higher-resolution inputs, and maintains data efficiency. The model is trained in approximately 1 day with 32 A100s. Additionally, it incorporates higher resolution, better OCR capability, and conversational abilities compared to LLaVA-1.5. The model also integrates SGLang for efficient inference and deployment and Vicuna 1.5 as the base language model. LLaVA-1.6 has received positive feedback and is seen as a significant advancement in open-source models.
Welcome to the era of open-source multimodal models, indeed! This is using: @ollama v0.1.23 (https://t.co/XZ1f5okUVg) Ollama Web UI v1.0.0-alpha.61 (https://t.co/IaG96VAhkZ) LLaVA 1.6 (https://t.co/4uO7Qx3FtI) Running on my Ubuntu "Server" with an old GTX 1650 Nice work… https://t.co/taiFuRReFR
Welcome to the era of open-source multimodal models, indeed! This is using: @ollama v0.1.2 (https://t.co/XZ1f5okUVg) Ollama Web UI v1.0.0-alpha.61 (https://t.co/IaG96VAhkZ) LLaVA 1.6 (https://t.co/4uO7Qx3FtI) Running on my Ubuntu "Server" with an old GTX 1650 Nice work… https://t.co/W28CTCo2AM
Open Source model LlaVa just released LlaVa-34B. As claimed, LLaVA-1.6 even surpasses Gemini Pro on several benchmarks. Here is the complete breakdown (plus 2 live tests) 🧵 https://t.co/xOUg1oKS5Q
Very cool… I love how easy @ollama makes it to post images to these multi modals https://t.co/kwcn9hxTh5
LLaVA 1.6 from @imhaotian has been released with improved resolution support, visual reasoning, and OCR capabilities, all while maintaining minimalist design and data efficiency. https://t.co/396xMviPP9
LLaVa 1.6 from @imhaotian has been released with improved resolution support, visual reasoning, and OCR capabilities, all while maintaining minimalist design and data efficiency. https://t.co/396xMviPP9
LLaVA v1.6 is out, pushing the limits of open multimodal models! We're glad to see two of our projects contribute to LLaVA: - SGLang for efficient inference and deployment - Vicuna 1.5 as the base language model Check out the demo at https://t.co/YmhKjWmOTi, served with SGLang!… https://t.co/FvughNOl88
LLaVA1.6 is out. OS models are making great progress https://t.co/DpqQ8gGvDp
Congrats to @imhaotian + @yong_jae_lee and team!!🥳 LLaVA-1.6 (an open source model!) beats Gemini Pro and comes close to GPT-4V on several benchmarks. https://t.co/mxFiSQHDcC
Boom! LLaVA-1.6, with improved reasoning, OCR, and world knowledge. It supports higher-res inputs, more tasks, and exceeds Gemini Pro on several benchmarks! It is trained ~1 day with 32 A100s. Model: https://t.co/pqUjcp3A92
LLaVA 1.6 is out! 🥳 - Outperforms Gemini PRO on some benchmarks - Higher resolution than LLaVA 1.5 (up to 4x more pixels!) - Better OCR capability and instruction-following - More conversational Models: https://t.co/200Qffi6fM Blog: https://t.co/nh5TaTHH3W https://t.co/kYrE7O2V1O
🚀We are thrilled to release LLaVA-1.6, with improved reasoning, OCR, and world knowledge. It supports higher-res inputs, more tasks, and exceeds Gemini Pro on several benchmarks! 🤯 It maintains the data efficiency of LLaVA-1.5, and LLaVA-1.6-34B is trained ~1 day with 32 A100s.… https://t.co/nGRpLX8FQv