A new study has introduced Recap-DataComp-1B, a dataset comprising 1.3 billion web images recaptioned using the LLaMA-3-powered LLaVA model. This initiative, spearheaded by researchers including X Li, H Tu, M Hui, and Z Wang from UC Santa Cruz, aims to enhance text-image datasets to improve model training for visual-language tasks. The dataset, which finetunes a LLaVA-1.5 model, is open-sourced and has shown substantial benefits in training advanced vision-language models such as CLIP and Diffusion Transformers. The study confirms that the enhanced dataset, derived from the DataComp-1B dataset, offers significant improvements in the performance of these models. The research was published in 2024.
[CV] What If We Recaption Billions of Web Images with LLaMA-3? X Li, H Tu, M Hui, Z Wang... [UC Santa Cruz] (2024) https://t.co/xk69SkzjGt - The paper presents Recap-DataComp-1B, a dataset with 1.3 billion web images recaptioned using LLaMA-3-powered Llava model. - Original⦠https://t.co/oXvcQjzIau
"What If We Recaption Billions of Web Images with LLaMA-3?"π€― And the results confirm that this enhanced dataset, Recap-DataComp-1B generated this way, offers substantial benefits in training advanced vision-language models. For discriminative models like CLIP, we observeβ¦ https://t.co/QCCZil11bW
What If We Recaption Billions of Web Images with LLaMA-3 ? ⼠A new study enhances text-image datasets using LLaMA-3, improving model training for visual-language tasks. With the open-source Recap-DataComp-1B dataset, models like CLIP & Diffusion Transformers show better⦠https://t.co/DqlrY5pkYa
Big thanks to @_akhaliq for the retweet! π We are very excited about presenting πΉππππ-π«ππππͺπππ-1π©, where we use a πππππ-π-powered LLaVA model to recaption the entire π.π ππ’π₯π₯π’π¨π§ images from DataComp-1B. Compared to the original textual descriptions,β¦ https://t.co/k6UGX7Lwdx
What If We Recaption Billions of Web Images with LLaMA-3 ? - Finetunes a LLaVA-1.5 and recaptions ~1.3B images from the DataComp-1B dataset - Opensources the resulting dataset data: https://t.co/9lrJj45ADI proj: https://t.co/SEgbRqOyZd abs: https://t.co/l9JX3Sh0Vb https://t.co/xaiJ3jEZil