Recent tweets from various users highlight the progress of @akashnet_ in developing Mixtral 8x7B, a model comparable to GPT 3.5. The paper 'Fast Inference of Mixture-of-Experts Language Models with Offloading' details the under-the-hood technique enabling the running of Mixtral-8x7B models on platforms like Google Colab and small GPUs. Notably, gpt-fast now supports mixtral-8x7B, and a team has optimized inference using Mixed quantization with HQQ and MoE offloading strategy, making it possible to run the models on free Colab or consumer desktops.
Finally. You can run Mixtral-8x7B models on free Colab or consumer desktops. A team was able to optimize inference using Mixed quantization with HQQ and MoE offloading strategy. It now fits models within combined GPU and CPU memory. Demo: https://t.co/jTl8u4569r https://t.co/Ow5oi93LDR
gpt-fast now supports mixtral-8x7B, in addition to gpt/llama. 1000 lines of simple pytorch code blazing it out! https://t.co/crXdcNy0uv https://t.co/W1HHn0DeWM
gpt-fast now supports mixtral, in addition to gpt/llama. 1000 lines of simple pytorch code blazing it out! https://t.co/crXdcNy0uv https://t.co/hkFrudYn63
Run Mixtral 8x7b on Google Colab Free via #TowardsAI → https://t.co/5DCQMHrbjp
Under-the-hood technique in this paper that made possible running the huge Mixtral-8x7B models in Free colab or smallish GPUs like a 3060. 🔥 Paper - "Fast Inference of Mixture-of-Experts Language Models with Offloading" 🚀 Quite a big achievement for low-resource Inferencing… https://t.co/hoGz1rqabq https://t.co/Y7tq4zeLKx
Pretty insane to see the progress @akashnet_ has been making. Mixtral 8x7B is supposed to be on par with GPT 3.5. Would be interesting to see how crypto could bootstrap and incentivize some of this development. Also open source AI + DePIN is something to keep an eye on. https://t.co/LGWY7R1Zga