Ollama Adds Google Gemma Support; Gemma Model Reaches

Google's Gemma model is now supported on Android using MLC LLM. Here is a demo of 4-bit quantized Gemma-2b model running on Samsung S23. Thanks to @ruihanglai and many others for bringing Gemma support to MLC! https://t.co/KXW7nse8OV https://t.co/8jNvd71cBV https://t.co/xuIEL7eJHW

Vaibhav (VB) Srivastav@reach_vb

4 mo

Gemma 2B at > 475 tok/sec! 🫡 Powered by JAX, Transformers and TPUs. Up to 4x faster than PyTorch (on A100). Based on the prompt and conditions, it can go up to 650 tok/ sec. ⚡ Kudos to @sanchitgandhi99 for integrating Gemma in JAX transformers! Note: This is on TPU v2;… https://t.co/zB1Csc7da1

Ruihang Lai@ruihanglai

4 mo

Run Gemma model locally on iPhone - we get blazing fast 20 tok/s for 2B model. This shows amazing potential ahead for Gemma fine-tunes on phones, made possible by the new MLC SLM compilation flow by @junrushao from @OctoAICloud and many other contributors. https://t.co/1L0mvG1bWq https://t.co/c4kCO9DRos

Brian Roemmele@BrianRoemmele

4 mo

Boom. Ollama now supports Google Gemma. 2B model: ollama run gemma 7B model: ollama run gemma:7b You can run non-quantized versions via Ollama (if video memory is sufficient). Link: https://t.co/P1vu4plYEF

ollama@ollama

4 mo

Ollama now supports Google Gemma! (Please update to v0.1.26) 2B model: ollama run gemma 7B model: ollama run gemma:7b Learn more: https://t.co/ULq9L3PHTJ You can run non-quantized versions via Ollama (if video memory is sufficient): ollama run gemma:2b-instruct-fp16… https://t.co/Hti1kpc8Sz

Similar Stories

Ollama Adds Google Gemma Support; Gemma Model Reaches Over 475 tok/sec on iPhone and Android Devices

Similar Stories

Sources

Ollama Adds Google Gemma Support; Gemma Model Reaches Over 475 tok/sec on iPhone and Android Devices