New Dataset Recap-DataComp-1B with 1.3 Billion Images

[CV] What If We Recaption Billions of Web Images with LLaMA-3? X Li, H Tu, M Hui, Z Wang... [UC Santa Cruz] (2024) https://t.co/xk69SkzjGt - The paper presents Recap-DataComp-1B, a dataset with 1.3 billion web images recaptioned using LLaMA-3-powered Llava model. - Original… https://t.co/oXvcQjzIau

Rohan Paul@rohanpaul_ai

15 d

"What If We Recaption Billions of Web Images with LLaMA-3?"🤯 And the results confirm that this enhanced dataset, Recap-DataComp-1B generated this way, offers substantial benefits in training advanced vision-language models. For discriminative models like CLIP, we observe… https://t.co/QCCZil11bW

nat://TheAIObserverX@TheAIObserverX

15 d

What If We Recaption Billions of Web Images with LLaMA-3 ? ◼ A new study enhances text-image datasets using LLaMA-3, improving model training for visual-language tasks. With the open-source Recap-DataComp-1B dataset, models like CLIP & Diffusion Transformers show better… https://t.co/DqlrY5pkYa

Cihang Xie@cihangxie

15 d

Big thanks to @_akhaliq for the retweet! 🚀 We are very excited about presenting 𝑹𝒆𝒄𝒂𝒑-𝑫𝒂𝒕𝒂𝑪𝒐𝒎𝒑-1𝑩, where we use a 𝐋𝐋𝐚𝐌𝐀-𝟑-powered LLaVA model to recaption the entire 𝟏.𝟑 𝐛𝐢𝐥𝐥𝐢𝐨𝐧 images from DataComp-1B. Compared to the original textual descriptions,… https://t.co/k6UGX7Lwdx

Aran Komatsuzaki@arankomatsuzaki

15 d

What If We Recaption Billions of Web Images with LLaMA-3 ? - Finetunes a LLaVA-1.5 and recaptions ~1.3B images from the DataComp-1B dataset - Opensources the resulting dataset data: https://t.co/9lrJj45ADI proj: https://t.co/SEgbRqOyZd abs: https://t.co/l9JX3Sh0Vb https://t.co/xaiJ3jEZil

Similar Stories

New Dataset Recap-DataComp-1B with 1.3 Billion Images Enhances Vision-Language Model Training

Similar Stories

Sources

New Dataset Recap-DataComp-1B with 1.3 Billion Images Enhances Vision-Language Model Training