Loading...
Grok's new 1.5 Vision model is now competitive with or outperforms leading AI models such as GPT-4V, Claude, and Gemini, according to recent benchmarks published by xAI. The integration of vision capabilities into Grok will allow it to process a variety of visual information, including multi-disciplinary reasoning, math, diagrams, text reading, charts, and documents. This marks Grok's move into multimodal functionalities, with early results showing promising performance. The vision model is expected to be integrated into the Grok chat in the medium term, with other features planned for release in the near future.
Vision coming to Grok https://t.co/uoPUsUOf3K
NEWS: Grok can now process a variety of visual information in addition to text! It is competitive with or outperforms existing multimodal models across benchmarks in areas like multi-disciplinary reasoning, math, diagrams, text reading, charts, and documents https://t.co/2hDR8qRlT8 https://t.co/mX4Zm0G7Y6
Grok 1.5 Vision's capabilities are competitive with GPT-4V, Claude and Gemini According to benchmarks just published by xAI https://t.co/xZ8Aeu6oGJ
Some early results of our first vision model. It'll be integrated into the Grok chat in the medium term. A few other features will ship before that (likely very soon). Props to {@tingchenai, @gabriel_ilharco}. https://t.co/WXGz5HzEhU
Grok is going multimodal! Itβs incredible to see how fast a small, focused team can move. Kudos to the amazing team @xAI that made this possible https://t.co/YKwketuk3s https://t.co/jBD9kUVlWH