Intel's AutoRound and FP8 Inference Innovate LLM Quant

🔥Want to use FP8 inference easily? Intel Neural Compressor is your best choice: https://t.co/XklzQFSYdz 🎯Shared with you our MLSys'24 camera-ready paper: Efficient Post-Training Quantization with FP8 Formats 🤗https://t.co/CHJyvQZhA2 @_akhaliq @navikm @huggingface #IAmIntel https://t.co/v7HO1bq8Ed

Rohan Paul@rohanpaul_ai

3 mo

AutoRound: Nice work from Intel(R) Neural Compressor team. ✨ 📌 SOTA Weight-Only Quantization Algorithm for LLMs Across Hardware Platforms 📌 Designed specifically for low-bit LLM inference, approaching near-lossless compression for a range of popular models 📌 Only tuning… https://t.co/OhcK7uve3y

oxen@oxen_ai

3 mo

This week our Arxiv Dive is both a MEETUP in SF and virtual. Paper is on "The Era of 1-bit LLMs"... both high-performing AND cost-effective🤯. We're training our own 1.58bit, Bessie too! We welcome the @Microsoft team nerd out with us. @ma_shuming @realHongyu_Wang @donglixp https://t.co/t4Cd4AmmQ4

Haihao Shen@HaihaoShen

3 mo

⚡️AutoRound, new SOTA LLM low-bit quantization approach developed by Intel Neural Compressor team (https://t.co/XklzQFSYdz) 🎯Lots of interesting comparison with GPTQ, AWQ, HQQ, etc. Check out the blog for more details: https://t.co/1fdyEs8Khx @huggingface #IAmIntel

Tim Dettmers@Tim_Dettmers

3 mo

This is excellent work — a big step forward in quantization! It enables full 4-bit matmuls, which can speed up large batch inference by a lot. Anyone deploying LLMs at scale will soon use this or similar techniques. https://t.co/3Q0RCbFES2

Similar Stories

Intel's AutoRound and FP8 Inference Innovate LLM Quantization; 1.58bit Bessie in Focus

Similar Stories

Sources

Intel's AutoRound and FP8 Inference Innovate LLM Quantization; 1.58bit Bessie in Focus