Tech companies like Clarifai, Google Devs, OpenRouterAI, and others are introducing new models and APIs for efficient and faster Large Language Models (LLMs) in machine learning. These advancements include 1-bit LLMs, MediaPipe LLM Inference API, Nitro models, and updated models in MLX LM with faster attention mechanisms. The new models promise significant speed improvements, reduced GPU memory usage, and enhanced energy efficiency.
Announcing AQLM v1.1! Featuring: 1. New model collection with SOTA accuracy https://t.co/xHiCxr2t2S 2. Gemma-2B support, running within 1.5GB; 3. LoRA integration for training Mixtral-8x7 on Colab; 4. Faster generation (3x) via CUDA graphs. Check it out: https://t.co/T4fYggSEBm
Updated some models in MLX LM to use the new fast attention (h/t @argmaxinc) pip install -U mlx-lm 4-bit Mixtral (~45B) is quite fast now on an M2 Ultra, even for thousands of tokens: https://t.co/xDC095e8M0
LLMs are faster and more memory efficient in MLX! - All quantized models 30%+ faster h/t @angeloskath - Fused attention for longer context can be 2x+ faster and use way less memory h/t @bpkeene @atiorh @argmaxinc Some tokens-per-second benchmarks for 7B Mistral: https://t.co/co1wii9fY9
1/ Excited to introduce Nitro models ๐ - available for Mixtral, MythoMax, and Llama 70B - powered by @GroqInc and @FireworksAI_HQ - up to 10x faster (see below) - chart performance over time ๐ And for devs: control your providers, and JSON mode! https://t.co/UR3pj9AXAf
Introducing 1-bit LLMs! โก๏ธ This represents a new paradigm for LLM quantization, where every single parameter is ternary {-1, 0, 1} instead of 16-bit. It makes the models 2.7x faster, uses 3.5x less GPU memory, and is 71x more energy efficient. https://t.co/bR9mTnde5t https://t.co/iB85qKyJ6d
๐ข Announcing the MediaPipe LLM Inference API โ https://t.co/TvaugBiCBn Learn to run Gemma & other on-device LLMs with MediaPipe & @TensorFlow Lite, & get updates on: ๐ก Performance & optimizations ๐ ๏ธ Supported model architectures ๐ง Experimental LLM Inference API ๐งก & more! https://t.co/duyTfhb71Q