A newer version of this article is available. Read the latest version

Image Viewer
Mistral AI Launches Open-Source Pixtral 12B Multimodal Text-Image Model
18 posts • ChatGPT (GPT-4o)
Updated
Mistral AI has launched Pixtral 12B, its first open-source multimodal model designed for both text and image processing. The model, which features a 12-billion parameter architecture, serves as a drop-in replacement for Mistral Nemo 12B and includes a new 400M vision encoder. Pixtral 12B supports variable image sizes, multi-image input, and maintains strong performance on both text-only and multimodal benchmarks. The model uses GeLU and 2D RoPE, has a vocabulary of over 131,000 tokens, and supports image sizes up to 1024x1024 pixels with a patch size of 16x16 pixels. Additionally, Mistral AI has made Pixtral 12B available on platforms such as Le Chat and la Plateforme, introduced a free tier, and reduced prices across all its models to boost accessibility. The model is released under Apache 2.0 and includes a 400M Vision Adapter.