OpenVLA, a new open-source vision-language-action (VLA) model, has been released. Developed from Llama-2 and incorporating Dino features, OpenVLA boasts 7 billion parameters and is trained on 970,000 robot episodes from the Open X-Embodiment dataset. It outperforms existing models like RT-2-X and Octo in zero-shot evaluations while being nearly 10 times smaller than RT-2-X. The model is designed for efficient inference and fine-tuning on a single GPU, utilizing quantization and LoRA. OpenVLA's code, data, and weights are fully available online, including a PyTorch codebase and models on HuggingFace, making it a significant step forward in accessible large-scale robotic learning. The project is expected to drive advancements in both academic and industry settings.
[RO] OpenVLA: An Open-Source Vision-Language-Action Model https://t.co/eCfjgsTeqB - OpenVLA is a 7B parameter open-source vision-language-action model (VLA) trained on 970k robot episodes from the Open X-Embodiment dataset. - It sets a new state-of-the-art for generalist robot⦠https://t.co/79S0R0eJx4
The OpenVLA project is finally out! Robotics has also been revolutionized by foundation models, but until now, the field did not have open access to any high-quality ones to build on top. I believe this project will open the door for academic and industry advances in robotics. https://t.co/RM68Ck8Svg
Really excited to share OpenVLA! - state-of-the-art robotic foundation model - outperforms RT-2-X in our evals, despite being nearly 10x smaller - code + data + weights open-source Webpage: https://t.co/Y0XU6kX3hl https://t.co/wqQbgG5z8I
Here's our latest work on robotic foundation models. I'm very eager to see what foundation models can do for robotics. We are still in the BERT era. Also let's see if we can keep the models open-source as they become stronger (unlike what happened in language)! https://t.co/rmynk9eNrF
New state-of-the-art visual-language-action model, based on Llama2 and Dino features. Open, general-purpose policy for turning images and language instructions into robot behaviors. Excited to see what people can do with this, and always glad to see more cool open models! https://t.co/3jCHwYZS2G
Very excited to release OpenVLA today, a 7B parameter open-source vision-language-action model (VLA). π¦Ύ SoTA generalist policy (better than Octo & RT-2-X) β‘οΈ Easy to run & fine-tune on 1 GPU with quantization and LoRA π» Open-source PyTorch codebase π€ Models on HuggingFace 1/ https://t.co/KwKX8NPMVr
OpenVLA is a VLM for robot control, open-source & available for the community: https://t.co/5jmeX2pqN5 Awesome collaboration led by @moo_jin_kim, @KarlPertsch, @siddkaramcheti W.r.t. large-scale robotic learning, this is an important step in making VLAs accessible. A thread π https://t.co/03HChp0j7a
π¨New fully open multi-robot generalist VLAπ¨ OpenVLA makes accessible one of the most important paradigms in robotics + AI today, VLAs. - works 0-shot on many robot embodiments - focuses on finetuning and efficient inference - RT-2X performance (!) at 7x fewer params (!!) https://t.co/ieT2BbHsuy
β¨ Introducing ππ©ππ§πππ β an open-source vision-language-action model for robotics! π - SOTA generalist policy - 7B params - outperforms Octo, RT-2-X on zero-shot evals π¦Ύ - trained on 970k episodes from OpenX dataset π€ - fully open: model/code/data all online π€ π§΅π https://t.co/z0l2bJOqBi
OpenVLA: An Open-Source Vision-Language-Action Model - Presents a 7B open-source vision-language-action model, pretrained on 970k robot episodes from the Open X-Embodiment dataset - Outperforms RT-2-X and Octo proj: https://t.co/wdTFFhAyIK abs: https://t.co/alDZtgK6dQ https://t.co/T3DBlG55QH
OpenVLA: An Open-Source Vision-Language-Action Model abs: https://t.co/5seAq9xBk7 project page: https://t.co/vr7u1mdY5w code: https://t.co/tAHz15bVrC Presents OpenVLA, a 7B param open-source vision-language-action model finetuned from Llama-2 combined with a visual encoder that⦠https://t.co/t3pZFfNnkv
vLLM now supports OpenAI Vision API compatible inference with open source models! [1/3] This change will be included in the upcoming v0.5.0 release, but you can get a preview on the latest documentation! https://t.co/lq9lFWnjjE