Microsoft has released Florence-2, a new vision foundation model designed for a variety of computer vision and vision-language tasks. Florence-2 demonstrates strong capabilities in Visual Question Answering (VQA) and Referring Expression Comprehension, although it is not state-of-the-art in object detection. The model supports tasks such as image captioning, optical character recognition, object detection, grounding, and segmentation, and shows strong zero-shot capabilities. Florence-2 can be fine-tuned on custom datasets, with resources such as a Colab notebook, fine-tuning scripts, and a walkthrough blog available for users, including specific guidance for the DocVQA dataset. Notably, Florence-2 can run locally in browsers using WebGPU and Transformers.js, making it highly accessible for developers.
Florence-2 WebGPU: The vision foundation model from @Microsoft - running locally in your browser w/ Transformers.js https://t.co/Z7L5Dp5PAs
Florence-2 WebGPU: vision foundation model running locally in your browser w/ Transformers.js https://t.co/NrP3S0VSsP
Florence-2: a new vision foundation model by Microsoft It supports tasks like image captioning, optical character recognition, object detection and more Try it out on @replicate 👇 https://t.co/4wMiLNOkYD
OMG. I was waiting for this to happen. Florence-2 is running in browser. @xenovacom can I run any fine-tuned model or only pre-trained ones? https://t.co/7PSc6MHhvI
Florence-2: a new vision foundation model by Microsoft It supports tasks like image captioning, optical character recognition, object detection, and more! Try it out on @replicate 👇 https://t.co/08HmFL6Lfr
Florence-2, the new vision foundation model by Microsoft, can now run 100% locally in your browser on WebGPU, thanks to Transformers.js! 🤗🤯 It supports tasks like image captioning, optical character recognition, object detection, and many more! 😍 WOW! Demo (+ source code) 👇 https://t.co/TyKnp8XUdN
Nice progress on the training of CapPa 🥳 The model now performs nice OCR! The prediction is often better than the actual caption 🤯 Really excited about the results!!! https://t.co/IYOGrwRKke
fine-tune Florence-2 on custom object detection dataset (super technical blog post; video tutorial coming this week) - format dataset - configure LoRA for optimized training - train and benchmark fine-tuned model link: https://t.co/p6VUROUw6t ↓ key takeaways https://t.co/uBlXyQhuWK
Object Detection Using Florence-2 🔥 The recently released Florence-2 model demonstrates strong zero-shot capabilities across tasks such as captioning, object detection, grounding, and segmentation. Below is an example of using the model for an object detection task and getting… https://t.co/PROC1h7T3M
Yes, the Florence-2 vision is very good. When I tested it, I thought it accessed the image's metadata because the description was so close to the original prompt used for its creation. But as I continued testing, I realized it's just that good! Experiment with the settings, tho… https://t.co/awgADF6tpf
New blog post regarding how to fine-tune Florence-2, the small and powerful VLM by @Microsoft, on a custom dataset (DocVQA) in plain @PyTorch: https://t.co/V7mwr53CYv
🚀 Fine-tune Florence-2 on any task! We are releasing fine tuning scripts for microsoft's Florence-2, alongside with a walkthrough blogspot, a space demo, and a Colab notebook. @mervenoyann @skalskip92 🧵 https://t.co/iZH86DekSE
Fine-tune Florence-2 on any task 🔥 Today we release a notebook and a walkthrough blog on fine-tuning Florence-2 on DocVQA dataset @andi_marafioti @skalskip92 Keep reading ⇓ https://t.co/vv28Efaf4g
Microsoft's small Florence-2 models are excellent for Visual Question Answering (VQA): On-par and beating all LLaVA-1.6 variants. While Florence-2 isn't SOTA in object detection, it's remarkably good in Visual Question Answering (VQA) and Referring Expression Comprehension… https://t.co/FZmEXsLskR
Microsoft Releases Florence-2: A Novel Vision Foundation Model with a Unified, Prompt-based Representation for a Variety of Computer Vision and Vision-Language Tasks #DL #AI #ML #DeepLearning #ArtificialIntelligence #MachineLearning #ComputerVision https://t.co/5cxQH65QjZ