Tech Companies Unveil 1-bit LLMs, MediaPipe API, Nitro

Announcing AQLM v1.1! Featuring: 1. New model collection with SOTA accuracy https://t.co/xHiCxr2t2S 2. Gemma-2B support, running within 1.5GB; 3. LoRA integration for training Mixtral-8x7 on Colab; 4. Faster generation (3x) via CUDA graphs. Check it out: https://t.co/T4fYggSEBm

Awni Hannun@awnihannun

4 mo

Updated some models in MLX LM to use the new fast attention (h/t @argmaxinc) pip install -U mlx-lm 4-bit Mixtral (~45B) is quite fast now on an M2 Ultra, even for thousands of tokens: https://t.co/xDC095e8M0

Awni Hannun@awnihannun

4 mo

LLMs are faster and more memory efficient in MLX! - All quantized models 30%+ faster h/t @angeloskath - Fused attention for longer context can be 2x+ faster and use way less memory h/t @bpkeene @atiorh @argmaxinc Some tokens-per-second benchmarks for 7B Mistral: https://t.co/co1wii9fY9

OpenRouter@OpenRouterAI

4 mo

1/ Excited to introduce Nitro models 🚀 - available for Mixtral, MythoMax, and Llama 70B - powered by @GroqInc and @FireworksAI_HQ - up to 10x faster (see below) - chart performance over time 📈 And for devs: control your providers, and JSON mode! https://t.co/UR3pj9AXAf

Clarifai@clarifai

4 mo

Introducing 1-bit LLMs! ⚡️ This represents a new paradigm for LLM quantization, where every single parameter is ternary {-1, 0, 1} instead of 16-bit. It makes the models 2.7x faster, uses 3.5x less GPU memory, and is 71x more energy efficient. https://t.co/bR9mTnde5t https://t.co/iB85qKyJ6d

Google for Developers@googledevs

4 mo

📢 Announcing the MediaPipe LLM Inference API → https://t.co/TvaugBiCBn Learn to run Gemma & other on-device LLMs with MediaPipe & @TensorFlow Lite, & get updates on: 💡 Performance & optimizations 🛠️ Supported model architectures 🧠 Experimental LLM Inference API 🧡 & more! https://t.co/duyTfhb71Q

Similar Stories

Tech Companies Unveil 1-bit LLMs, MediaPipe API, Nitro Models, and MLX LM for Machine Learning

Similar Stories

Sources

Tech Companies Unveil 1-bit LLMs, MediaPipe API, Nitro Models, and MLX LM for Machine Learning