MistralAI has announced the release of its new model, Mistral 7B v0.2, featuring significant updates including a new base model, an expanded 32K context window up from 8K, and the introduction of Theta (RoPE) set at 1e6, eliminating the need for a sliding window. This announcement was made at SHACK15sf and has sparked excitement across the tech community, with collaborations and integrations already underway, including with llama_index and through platforms like argilla_io. The model's performance improvements are highlighted by its tokens per second rate on the M2 Ultra, showing an increase from 64.3 to 74 for the 4-bit Starcoder 7B and from 73.1 to 83.2 for the 4-bit Mistral 7B in the MLX 0.7 to 0.8 update, thanks to optimizations such as Nanobind, which reduces Python overheads, and enhancements to fast RMS and layer norms. Additionally, the ORPO fine-tune of Mistral 7B v0.1 using DPO Mix 7K is now available in the Hub, with future updates promising 90+ tokens per second. These developments underscore MistralAI's position at the forefront of AI model innovation.
4-bit Mistral 7B 90+ toks-per-sec coming soon to MLX LM (on M2 Ultra) https://t.co/PRpafO0zQ8 https://t.co/ybzcVLd5I1
This is huge news. Mistral 7B was already the best model in its size class, and these improvements are a huge step up. I’ll be re-training many of my current fine-tunes over this model ASAP. https://t.co/NXqcKjQzHP
B R E A K I N G Mistral just announced Mistral-7B-v0.2 - New base Model - 32K context window (instead of 8k) - Theta (RoPE) = 1e6 - No sliding window https://t.co/tEnZOD3xtL
yo @MistralAI dropping a new model today!! https://t.co/KHYnwASqNM
Mistral just announced at @SHACK15sf that they will release a new model today: Mistral 7B v0.2 Base Model - 32k instead of 8k context window - Rope Theta = 1e6 - No sliding window https://t.co/iAuEUEOw5K
Mistral casually dropping a new model at the @cerebral_valley hackathon https://t.co/UI2ypNmfdl
Mistral-7B-v0.2 just dropped with a full house 🔥 https://t.co/dT09nvTfra
Excited for @MistralAI + @llama_index collabs (and Colabs) 🦙🔥 Thanks @sophiamyang for dropping by! https://t.co/YxjtI3xMbN
⚡️ ORPO fine-tune of Mistral 7B v0.1 from @MistralAI using @argilla_io DPO Mix 7K! 🤗 Available in the Hub at https://t.co/yzAMAypjuW https://t.co/dGNmeuKliS
MLX 0.7 → 0.8 Tokens per second on M2 Ultra - 4-bit Starcoder 7B, 64.3 → 74 - 4-bit Mistral 7B, 73.1 → 83.2 Thanks to Nanobind and fast RMS and layer norms. Look at that Mistral go: https://t.co/5hxs4JxNAm
MLX 0.7 → 0.8 Tokens per second on M2 Ultra - 4-bit Starcoder 7B, 64.3 → 74 - 4-bit Mistral 7B, 73.1 → 83.2 Thanks to - Nanobind got rid of some Python overheads - Fast RMS norm and layer norm Look at that Mistral go: https://t.co/wOT8id8aiz
MLX 0.7 → 0.8 Tokens per second for - 4-bit Starcoder 7B, 64.356 → 74 - 4-bit Mistral 7B, 73.1 → 83.2 Nanobind got rid of some Python overheads