Google has open-sourced PaLI, a Vision-Language Model (VLM) developed in collaboration with Google Research. PaLI, part of the Gemma family, offers features like scalability, pre-training methods, and data mixtures. The release includes PaliGemma-3B with SigLip encoder + Gemma decoder, available in resolutions of 224, 448, and 896 px. It can be finetuned on platforms like Google Colab and Hugging Face transformers.
PaliGemma is out and you can finetune it on Google Colab in a matter of minutes. https://t.co/lBmwYsPXwF
Google's newly announced PaliGemma in @huggingface https://t.co/LrdfyPA1A8
A very very nice release by @Google : PaliGemma! It's a pure VLM, SigLip encoder + Gemma decoder, 3b, comes in 3 resolutions including 224, 448 and 896 px. And it's in @huggingface transformers to try it: https://t.co/Ou2NQuRfgW
PaLI is finally opensourced as part of the Gemma family! PaLI has been our long-running VLM research project with colleagues in Google Research, where we explored scalability, pre-training methods, data mixtures, and more. See [1,2,3] for background. Super cool to see it go… https://t.co/dvbQWaXnyN
We just released PaliGemma-3B, a very capable Vision-Language Model. Do not waste any time, finetune it for your task: Code: https://t.co/V9wQU7jtmv Colab: https://t.co/aDGJd7Iz8z Kaggle: https://t.co/A5ZrnjDZni HF: https://t.co/Du52eHcXNh Vertex AI: https://t.co/qxK9Irgera