Gemma 2B/7B Models Hit >2500 tok/s, 100x Usability Boo

Besides Android, iOS, and web browsers, Gemma is also supported in MLC LLM on various GPUs! A single model definition does it all -- thanks to the ML compiler infra lead by @junrushao and many others! Try it in Google Colab: https://t.co/Xh25ZFAXch https://t.co/mPAzahVb3a

Charlie Ruan@charlie_ruan

4 mo

https://t.co/in4MFlYKaW now adds Gemma from @GoogleDeepMind! The 2b model is perfect for building in-browser agents with @WebGPU acceleration -- everything local! Here is a 1x speed demo of 4-bit quantized gemma-2b-it on @GooglePixel_US 7 Pro with @googlechrome. https://t.co/7gKTZYD1FH

infinite ∞✨@thatmarklee

4 mo

thrilled to find that @ollama 0.1.27 is now 100x more usable on a 5-year old Windows laptop with no GPU. https://t.co/RpJpfuDJjU

ollama@ollama

4 mo

Ollama v0.1.27 is released! 🐎 Performance improvements (up to 2x) when running Gemma models 🚄Fixed performance issues on Windows without GPU acceleration. Systems with AVX and AVX2 instruction sets should be 2-4x faster. 💊Reduced likelihood of false positive Windows…

Fireworks AI@FireworksAI_HQ

4 mo

We’re now serving Gemma 7B Instruct on the Fireworks platform! Try out Google’s latest model on Fireworks to enjoy fast inference speeds, token-based pricing, and an OpenAI-compatible, user-friendly API. Get started on our playground at https://t.co/k2thudqXg0 or through our API… https://t.co/A2OrLrO5aI

Together AI@togethercompute

4 mo

Gemma models from @GoogleAI are now available on Together API! Check out the models on our playground 👇 Gemma-7b-it: https://t.co/OaW7rGFiLO Gemma-2b-it: https://t.co/OaW7rGFiLO Gemma-7b: https://t.co/zQM20Oc4sa Gemma-2b: https://t.co/2mQQSXKSbS

Anyscale@anyscalecompute

4 mo

💪Gemma-7b from @GoogleAI, one of the best 7b models available, is now supported on Anyscale Endpoints! 📈Results on MMLU are comparable to Mistral-7b and surpass Llama2-13b Try it out: https://t.co/MStW5BG3lD https://t.co/D0Wdh603NO

Anyscale@anyscalecompute

4 mo

Gemma-7b from @GoogleAI, one of the best 7b models available, is now supported on Anyscale Endpoints! Try it out: https://t.co/MStW5BG3lD https://t.co/bVtlEMRDdI

Cognitive Computations@cognitivecompai

4 mo

Wow! Expo It's a response to flutter in typescript. I'm sold! Hmmm I wonder if I can get a 2b model (Gemma maybe?) running local on a phone. https://t.co/BBYLroGL8i

ifioravanti@ivanfioravanti

4 mo

Sneak peak, hey, pssst... @ollama 0.1.27 is out! Performance improvements (up to 2x) when running Gemma models Let's try it! https://t.co/hy050gpYl4

Lepton AI@LeptonAI

4 mo

🚀 Excited to introduce the future of machine learning on Lepton AI! We've teamed up with Google's latest marvel, Gemma, to bring you an API that's as powerful as it is user-friendly. 🌟 Start exploring with Gemma today at https://t.co/81I1zjBuny. #MachineLearning #AI #Gemma

Bindu Reddy@bindureddy

4 mo

The power of open-source AI. Mistral Pro from Tencent is now at par with Gemma on math and coding talks. It only took 3 days for them to get there. 7B models will become really interesting when they reach GPT 3.5 performance. That's an MMLU of around 71-72. We will get there… https://t.co/Xr5wmOhZFZ

chansung@algo_diver

4 mo

Just found that @MistralAI 7B model is available in KerasNLP with #Keras 3.0. Now, we have two leading open LLMs in Keras ecosystem #Gemma and Mistral-7B! This is what we have been waiting for a long time :) Personally, I will build end-to-end MLOps pipeline for them! https://t.co/PpNHECUFvc

Ryan Carson@ryancarson

4 mo

This is how easy it is to run a model locally. 2 commands in the terminal to get Gemma running. 👏 @ollama It's awesome to see more open source models being released. Thank you @sundarpichai @demishassabis @JeffDean @ZoubinGhahrama1 @OriolVinyalsML @asoroken @alexanderchen… https://t.co/qozcd6IJ1g

ollama@ollama

4 mo

💫 Google Gemma model updated to no longer output undesired text locally! It's really good! To update the models if you have previously downloaded: ollama pull gemma (default 7B) ollama pull gemma:2b (2B model) Run the pulled models with `ollama run` A big ❤️ thank you… https://t.co/j2qEnO7wMY

kartik khandelwal@kartik_kdl

4 mo

Google's Gemma model is now supported on Android using MLC LLM. Here is a demo of 4-bit quantized Gemma-2b model running on Samsung S23. Thanks to @ruihanglai and many others for bringing Gemma support to MLC! https://t.co/KXW7nse8OV https://t.co/8jNvd71cBV https://t.co/xuIEL7eJHW

Vaibhav (VB) Srivastav@reach_vb

4 mo

Gemma 2B at > 475 tok/sec! 🫡 Powered by JAX, Transformers and TPUs. Up to 4x faster than PyTorch (on A100). Based on the prompt and conditions, it can go up to 650 tok/ sec. ⚡ Kudos to @sanchitgandhi99 for integrating Gemma in JAX transformers! Note: This is on TPU v2;… https://t.co/zB1Csc7da1

Tianqi Chen@tqchenml

4 mo

Gemma 2b running on iphone at 22+ tok/sec. 2B really hits a sweetspot for running local model on phone. Try it out via https://t.co/emAhXUXZew 👉 https://t.co/BvfBXbtCS9

Ruihang Lai@ruihanglai

4 mo

Run Gemma model locally on iPhone - we get blazing fast 20 tok/s for 2B model. This shows amazing potential ahead for Gemma fine-tunes on phones, made possible by the new MLC SLM compilation flow by @junrushao from @OctoAICloud and many other contributors. https://t.co/1L0mvG1bWq https://t.co/c4kCO9DRos

Awni Hannun@awnihannun

4 mo

The 🤗 MLX community is fast. Already quantized and uploaded all the Gemma model variants: Available here: https://t.co/dUgErUXnM3 Thanks @Prince_Canuma and @lazarustda ! https://t.co/fbEyBIy9GC

Charles 🎉 Frye@charles_irl

4 mo

For my first official contribution to the @modal_labs examples: running Gemma 7B on an H100 at >2500 tok/s 🚀 With very little effort, that's already just ~75¢ per megatoken -- and you have full "tensors-and-a-shell" control over the execution environment https://t.co/D6ls1m8MAE

Similar Stories

Gemma 2B/7B Models Hit >2500 tok/s, 100x Usability Boost on Windows

Similar Stories

Sources

Gemma 2B/7B Models Hit >2500 tok/s, 100x Usability Boost on Windows