InternLM-XComposer2, a state-of-the-art vision-language large model (VLLM) based on InternLM2-7B, has excelled in free-form text-image composition and accurate vision-language problem-solving, matching and even surpassing GPT-4V and Gemini Pro in six benchmarks. Tencent's Object-Driven One-Shot Fine-tuning of Text-to-Image Diffusion with Prototypical Embedding also contributes to the field. Experimental results demonstrate the superiority of InternLM-XComposer2, and an AI demo showcased its ability to write full articles with interleaved images, all generated by the model itself.
InternLM-XComposer2: a VLLM based on InternLM2-7B. ✨Free-form Interleaved Text-Image Composition ✨Accurate Vision-language Problem-solving ✨Awesome performance: matches GPT-4V and Gemini Pro in 6 benchmarks Paper: https://t.co/1kzf79ycUo https://t.co/dOtFdXa8TG
InternLM-XComposer2: a VLLM based on InternLM2-7B. ✨Free-form Interleaved Text-Image Composition ✨Accurate Vision-language Problem-solving ✨Awesome performance: matches GPT-4V and Gemini Pro in 6 benchmarks Paper: https://t.co/1kzf79yKJW Model: https://t.co/dOtFdXaGJe
InternLM-XComposer2: a VLLM based on InternLM2-7B. ✨Free-form Interleaved Text-Image Composition ✨Accurate Vision-language Problem-solving ✨Awesome performance: matches GPT-4V and Gemini Pro in 6 benchmarks Model: https://t.co/dOtFdXa8TG
🤯Mind-blown research & AI demo alert: InternLM-XComposer2 🚀A sota LVLM excelling in free-form text-image composition matches (even surpasses) GPT-4V & Gemini Pro 🔥🔥Attached video showcases the model Writing A Full Article On A Topic With Inter-leaved Images (all generated) https://t.co/k0dXZxfmXa https://t.co/eRuCzsjBue
Tencent presents Object-Driven One-Shot Fine-tuning of Text-to-Image Diffusion with Prototypical Embedding paper page: https://t.co/o4cQkbADYE As large-scale text-to-image generation models have made remarkable progress in the field of text-to-image generation, many fine-tuning… https://t.co/QdNownXKoa
Thanks @_akhaliq for sharing our InternLM-XComposer2. Tech report: https://t.co/ANPFH0dYjE https://t.co/ys4yqRBd9g
InternLM-XComposer2: Mastering Free-form Text-Image Composition and Comprehension in Vision-Language Large Model paper page: https://t.co/UQMP0v4yEK Experimental results demonstrate the superiority of InternLM-XComposer2 based on InternLM2-7B in producing high-quality long-text… https://t.co/Z9xYWBPp0n