Recent advancements in machine learning models have shown promising results in quantization methods for 2-bit and 1-bit LLMs, leading to significant reductions in model sizes and memory footprints. Companies like Databricks, Mistral, and Apple have introduced new models with improved performance and efficiency. These developments mark a significant shift in the landscape of large language models, with potential implications for training and inference speeds.
Potentially the biggest paradigm shift in LLMs Two independent studies managed to pre-train 1.58-bit LLMs that match the performance of FP16 models. Need to see how it scales (~30B), but super curious about 1.58-bit Mamba and MoE models. https://t.co/56EepNqIgP https://t.co/xybyVHBgTi https://t.co/QpCrlu4oJu
so last month msft published a paper showing a 1 bit parameter LLM with minimal performance loss. someone on huggingface just replicated the results today. this is at least a 10x reduction memory footprint and opens up a path for even more gains in training / inference speeds https://t.co/ApHeGZDrFA
๐ #ElonMusk's AI leaps forward with Grok-1.5, boasting superior math skills. ๐ #Databricks debuts its model, setting a new benchmark. ๐ #AI21Labs introduces Jamba, merging Mamba with Transformer architecture. Read more: https://t.co/rOSYi59xTY https://t.co/Zm2upLfVcr
Apple MLX 0.1 vs 0.9 ๐ 100% performance boost! Here tests using Nexusflow/Starling-LM-7B-beta as model on a Mac Studio Ultra 192GB and 76GPU cores, tokens/sec: - 4bit: 40 vs 86 - 8bit: 32 vs 60 - fp16: 25 vs 30 Thanks @awnihannun @angeloskath and all the contributors!
Open ML is going brrr. In just 5 days ๐งฑ Databricks releases DBRX ๐ฆพ Mistral releases 7B v2 ๐Qwen1.5 MoE-A2.7B ๐Jamba, a MoE SSM LLM ๐คWild 1-bit and 2-bit quantization with HQQ+ 3 big pre-trained MoEs, a new Mistral base, and crazy updates for on-device. Let's goo ๐
Promising quantization method for 2bit and 1bit LLMs. Less useful for models that are already *small*, but doing this on a larger model is very interesting. Ex, the mixtral model can be brought down to 14GB of vram (from 94GB), the equivalent of mistral-7b running at fp16 but aโฆ https://t.co/0ymQvxPpa9 https://t.co/AvEzkwZbyl