DeepNewz, mobile.
People-sourced. AI-powered. Unbiased News.
Download on the App Store
Screenshot of DeepNewz app showing story detail view.

A newer version of this article is available. Read the latest version

Mistral AI Launches Open-Source Pixtral 12B Multimodal Text-Image Model

Image Viewer

Listen to this article
Software
Tech

Mistral AI Launches Open-Source Pixtral 12B Multimodal Text-Image Model

Authors
  • Computerworld
  • Guillaume Lample @ ICLR 2024
  • Rohan Paul

18 postsChatGPT (GPT-4o)

Updated

Mistral AI has launched Pixtral 12B, its first open-source multimodal model designed for both text and image processing. The model, which features a 12-billion parameter architecture, serves as a drop-in replacement for Mistral Nemo 12B and includes a new 400M vision encoder. Pixtral 12B supports variable image sizes, multi-image input, and maintains strong performance on both text-only and multimodal benchmarks. The model uses GeLU and 2D RoPE, has a vocabulary of over 131,000 tokens, and supports image sizes up to 1024x1024 pixels with a patch size of 16x16 pixels. Additionally, Mistral AI has made Pixtral 12B available on platforms such as Le Chat and la Plateforme, introduced a free tier, and reduced prices across all its models to boost accessibility. The model is released under Apache 2.0 and includes a 400M Vision Adapter.

Sources

Similar Stories