Researchers Explore Quantization Benefits with HQQ+ an

so last month msft published a paper showing a 1 bit parameter LLM with minimal performance loss. someone on huggingface just replicated the results today. this is at least a 10x reduction memory footprint and opens up a path for even more gains in training / inference speeds https://t.co/ApHeGZDrFA

Marktechpost AI Research News ⚡@Marktechpost

3 mo

Efficiency Breakthroughs in LLMs: Combining Quantization, LoRA, and Pruning for Scaled-down Inference and Pre-training Quick read: https://t.co/SKDFQJZeNo Researchers from Meta FAIR, UMD, Cisco, Zyphra, MIT, and Sequoia Capital examine a layer-pruning approach for popular…

anton@abacaj

3 mo

Promising quantization method for 2bit and 1bit LLMs. Less useful for models that are already *small*, but doing this on a larger model is very interesting. Ex, the mixtral model can be brought down to 14GB of vram (from 94GB), the equivalent of mistral-7b running at fp16 but a… https://t.co/0ymQvxPpa9 https://t.co/AvEzkwZbyl

jackson petty@jowenpetty

3 mo

Very cool result that points towards existing LLMs being “too deep,” paying costs in compute w/o getting much back for performance! Similar to our conclusion in https://t.co/Gbsa80GUSy, but here they focus on pruning an existing model rather than training from scratch! https://t.co/NId0F1nY5j

Rohan Paul@rohanpaul_ai

3 mo

The new era of 1-bit and 2-bit quantizations. Their "findings indicate that heavily quantizing larger models using techniques like HQQ+ can yield superior performance while still maintaining a relatively small memory footprint." https://t.co/maQCvColOf

Similar Stories

Researchers Explore Quantization Benefits with HQQ+ and Layer-pruning

Similar Stories

Sources

Researchers Explore Quantization Benefits with HQQ+ and Layer-pruning