Ollama announces support for Google Gemma with 2B and 7B models. Gemma model achieves high speeds on iPhone and Android devices. Gemma 2B reaches over 475 tok/sec, powered by JAX, Transformers, and TPUs, with potential for 650 tok/sec. Gemma also demonstrated on Samsung S23 using MLC LLM.
Google's Gemma model is now supported on Android using MLC LLM. Here is a demo of 4-bit quantized Gemma-2b model running on Samsung S23. Thanks to @ruihanglai and many others for bringing Gemma support to MLC! https://t.co/KXW7nse8OV https://t.co/8jNvd71cBV https://t.co/xuIEL7eJHW
Gemma 2B at > 475 tok/sec! 🫡 Powered by JAX, Transformers and TPUs. Up to 4x faster than PyTorch (on A100). Based on the prompt and conditions, it can go up to 650 tok/ sec. ⚡ Kudos to @sanchitgandhi99 for integrating Gemma in JAX transformers! Note: This is on TPU v2;… https://t.co/zB1Csc7da1
Run Gemma model locally on iPhone - we get blazing fast 20 tok/s for 2B model. This shows amazing potential ahead for Gemma fine-tunes on phones, made possible by the new MLC SLM compilation flow by @junrushao from @OctoAICloud and many other contributors. https://t.co/1L0mvG1bWq https://t.co/c4kCO9DRos
Boom. Ollama now supports Google Gemma. 2B model: ollama run gemma 7B model: ollama run gemma:7b You can run non-quantized versions via Ollama (if video memory is sufficient). Link: https://t.co/P1vu4plYEF
Ollama now supports Google Gemma! (Please update to v0.1.26) 2B model: ollama run gemma 7B model: ollama run gemma:7b Learn more: https://t.co/ULq9L3PHTJ You can run non-quantized versions via Ollama (if video memory is sufficient): ollama run gemma:2b-instruct-fp16… https://t.co/Hti1kpc8Sz