Google's Gemma model, with versions including 2B and 7B, has emerged as a significant innovation in the open-source, generative Large Language Models (LLMs) domain. The Gemma 2B version impresses with its performance, achieving 22+ tokens per second on an iPhone and over 475 tokens per second on TPU v2, with the potential to reach up to 650 tokens per second under certain conditions. This version, powered by JAX, Transformers, and TPUs, is up to 4x faster than PyTorch on A100. Furthermore, Gemma's compatibility extends to Android through MLC LLM, demonstrated by the 4-bit quantized Gemma-2b model running on a Samsung S23. The model's support is also expanding across platforms, including availability on Together API and Anyscale Endpoints, with versions Gemma-7b-it, Gemma-2b-it, Gemma-7b, and Gemma-2b. The integration efforts by various developers and the model's architecture, which includes a large vocabulary, have been highlighted as key factors in its performance. Additionally, Gemma's application in building in-browser agents with WebGPU acceleration, as demonstrated on a Google Pixel US 7 Pro with Google Chrome, showcases its versatility and potential for innovation in AI, emphasizing everything local.
Google's new Gemma LLM is outperforming the Mistral-7B in various benchmarks. It Includes: - Question Answering - Math/Science - Reasoning - Coding Live comparison of Google's Gemma vs. Mistral-7B: https://t.co/UyeC0K290Q
Gemma is now available for use and finetune on Lightning Studios now. Shout out to Google for joining the open source AI effort. Great explanation by @rasbt https://t.co/upGWrLl8UF
Google's Gemma has been the topic of the week for both LLM researchers and users. My colleagues and I just ported the code to LitGPT, and we discovered some interesting surprises and model architecture details along the way: 1) Gemma uses a really large vocabulary and… https://t.co/PI5IXqYZh0
Off the shelf is good, fine-tuned is better. Here's everything you need to #efficiently fine-tune @Google's #Gemma7B for your task. 🚀 Best of all, you can do it with @ludwig_ai, the open-source declarative framework for building custom #LLMs.🔥 https://t.co/nN5BM2dTbA
https://t.co/in4MFlYKaW now adds Gemma from @GoogleDeepMind! The 2b model is perfect for building in-browser agents with @WebGPU acceleration -- everything local! Here is a 1x speed demo of 4-bit quantized gemma-2b-it on @GooglePixel_US 7 Pro with @googlechrome. https://t.co/7gKTZYD1FH
Gemma models from @GoogleAI are now available on Together API! Check out the models on our playground 👇 Gemma-7b-it: https://t.co/OaW7rGFiLO Gemma-2b-it: https://t.co/OaW7rGFiLO Gemma-7b: https://t.co/zQM20Oc4sa Gemma-2b: https://t.co/2mQQSXKSbS
Gemma-7b from @GoogleAI, one of the best 7b models available, is now supported on Anyscale Endpoints! Try it out: https://t.co/MStW5BG3lD https://t.co/bVtlEMRDdI
Discover the transformative power of fine-tuning LLMs for scalable, cost-effective AI solutions. 🚀 In our latest blog post, we delve into the intersection of AI innovation and affordability with GPT-4 and @predibase. 📘 Read the full post! https://t.co/EKBvAQqlKt
Google's Gemma model is now supported on Android using MLC LLM. Here is a demo of 4-bit quantized Gemma-2b model running on Samsung S23. Thanks to @ruihanglai and many others for bringing Gemma support to MLC! https://t.co/KXW7nse8OV https://t.co/8jNvd71cBV https://t.co/xuIEL7eJHW
The release of Gemma by Google is an interesting move that immediately made Google a key player in the domain of open-source, generative LLMs. Here are five key takeaways/properties that make Gemma different from other alternatives… (1) Architecture. Gemma uses a decoder-only… https://t.co/ILwUFlppXn
Gemma 2B at > 475 tok/sec! 🫡 Powered by JAX, Transformers and TPUs. Up to 4x faster than PyTorch (on A100). Based on the prompt and conditions, it can go up to 650 tok/ sec. ⚡ Kudos to @sanchitgandhi99 for integrating Gemma in JAX transformers! Note: This is on TPU v2;… https://t.co/zB1Csc7da1
Gemma 2b running on iphone at 22+ tok/sec. 2B really hits a sweetspot for running local model on phone. Try it out via https://t.co/emAhXUXZew 👉 https://t.co/BvfBXbtCS9