Alibaba, Bytedance, and Tencent have unveiled various diffusion models for tasks such as text-to-image generation, 3D reconstruction, and image inpainting. These models, including UniSDF, Emu2, MoSAR, SpecNeRF, and DiffPortrait3D, demonstrate high performance in their respective applications. The advancements include methods for 3D reconstruction with reflections, high-fidelity 3D scene reconstruction, and controllable diffusion for zero-shot portrait view synthesis.
[CV] HD-Painter: High-Resolution and Prompt-Faithful Text-Guided Image Inpainting with Diffusion Models https://t.co/Nh5rtdDLrW This paper introduces HD-Painter, a high-resolution and prompt-faithful text-guided image inpainting method. By designing the Prompt-Aware… https://t.co/Mysa0aqUYz
[CV] DreamTuner: Single Image is Enough for Subject-Driven Generation https://t.co/YXpperJlJJ DreamTuner is a novel method for subject-driven image generation. Unlike existing methods that rely on fine-tuning or additional image encoders, DreamTuner introduces a… https://t.co/U3Q7fuIRQ9
Bytedance announces DiffPortrait3D, a controllable Diffusion for Zero-Shot Portrait View Synthesis. Link: https://t.co/CAiUeP2H4w https://t.co/vrIj2zKM8i
Bytedance announces DiffPortrait3D Controllable Diffusion for Zero-Shot Portrait View Synthesis paper page: https://t.co/lOpCdNXgUi given an unposed portrait image, diffportrait3d can synthesize plausible but consistent facial details with retained both identity and facial… https://t.co/AODei632q9
We're thrilled to introduce DiffPortrait3D🚀🔥🫡, a diffusion model-based approach specially designed for single portrait novel view synthesis! Paper: https://t.co/DBg2O9b91k Website: https://t.co/WxTfu3qCZY Code: https://t.co/dr6LegIJM5 https://t.co/9uM6vesiz4
ShowRoom3D: Text to High-Quality 3D Room Generation Using 3D Priors paper page: https://t.co/eJR1z54cTc introduce ShowRoom3D, a three-stage approach for generating high-quality 3D room-scale scenes from texts. Previous methods using 2D diffusion priors to optimize neural… https://t.co/rsxXGfbJ5e
Daily AI News in 60 Seconds 1/8 Alibaba Research announces DreaMoving, a human video generation framework based on diffusion models Specifically, given target identity & posture sequences, DreaMoving can generate a video of the target identity dancing anywhere driven by the… https://t.co/GO3lRr805D
HD-Painter: High-Resolution and Prompt-Faithful Text-Guided Image Inpainting with Diffusion Models paper page: https://t.co/yAgfiCx0yi Recent progress in text-guided image inpainting, based on the unprecedented success of text-to-image diffusion models, has led to exceptionally… https://t.co/EdSqDjDmX0
Carve3D: Improving Multi-view Reconstruction Consistency for Diffusion Models with RL Finetuning paper page: https://t.co/9De3Ee7UXp Recent advancements in the text-to-3D task leverage finetuned text-to-image diffusion models to generate multi-view images, followed by NeRF… https://t.co/DeVXwGMVZJ
Tencent announces Paint3D Paint Anything 3D with Lighting-Less Texture Diffusion Models paper page: https://t.co/3YiTIaId8V Paint3D is a novel coarse-to-fine generative framework that is capable of producing high-resolution, lighting-less, and diverse 2K UV texture maps for… https://t.co/I17MU0ic4B
Bytedance announces DREAM-Talk Diffusion-based Realistic Emotional Audio-driven Method for Single Image Talking Face Generation paper page: https://t.co/iPel0Nj8Wa The generation of emotional talking faces from a single portrait image remains a significant challenge. The… https://t.co/hvykB1uvLY
[CV] UniSDF: Unifying Neural Representations for High-Fidelity 3D Reconstruction of Complex Scenes with Reflections https://t.co/jW77SqAvrK UniSDF is a neural representation method that enables high-fidelity 3D reconstruction of complex scenes with reflections. It accurately… https://t.co/A9qCgjsm7b
[CV] SpecNeRF: Gaussian Directional Encoding for Specular Reflections https://t.co/NnIlIEjF4r This paper introduces a model called SpecNeRF for modeling specular reflections in 3D scenes. Unlike existing methods, SpecNeRF uses a learnable Gaussian directional encoding to… https://t.co/iyCqE1KzX2
Single-image 3D gaussian reconstruction at 38 FPS 🤯 On the left is the input image, and on the right is a prediction of novel-view, rendered in real time. Impressive results by @StanSzymanowicz, @chrirupp and Andrea Vedaldi https://t.co/sqRY9QeauF https://t.co/F7V1yEcdIs
📢📢 Align Your Gaussians: Text-to-4D with Dynamic 3D Gaussians and Composed Diffusion Models https://t.co/MZIXSQlq1q We generate dynamic 4D assets and scenes with score distillation! w/ the amazing @HuanLing6*, @seungkim0123*, Antonio Torralba, @FidlerSanja (1/n) https://t.co/dGkBdD0M7b
🔥Introducing Emu2🔥 An *open-weight* multimodal model surpassing Flamingo-80B in image-to-text in-context performance. But that's not all - Emu2 as a versatile generalist can create images/videos from in-context prompts. 🔗Project: https://t.co/B4Zgrvnts6 📽️2-min intro video: https://t.co/iUbOttusyT
MoSAR turns a portrait image into a relightable 3D avatar. It estimates detailed geometry and rich reflectance maps (diffuse, specular, normals, ambient occlusion, translucency) at 4K resolution. Paper: MoSAR: Monocular Semi-Supervised Model For Avatar Reconstruction Using… https://t.co/B6Ek8m9sts
Existing generic 3D reconstruction methods often struggle to represent fine geometric details and do not adequately model reflective surfaces of large-scale scenes. UniSDF is a general-purpose 3D reconstruction method that can reconstruct large complex scenes with reflections.… https://t.co/tc5r36e7f0
Emu2 is a generative multimodal model with 37 billion parameters, trained on large-scale multimodal sequences with a unified autoregressive objective. Emu2 exhibits strong multimodal in-context learning abilities, even emerging to solve tasks that require on-the-fly reasoning,… https://t.co/GNXQAkKT0k
This AI Paper from Alibaba Unveils SCEdit: Revolutionizing Image Diffusion Models with Skip Connection Tuning for Enhanced Text-to-Image Generation Quick read: https://t.co/rr9FBO18LP Paper: https://t.co/BcDu9RRBe9 Project: https://t.co/OgaJ4iS6IR #ArtificialInteligence… https://t.co/SAayeFLWGV
Introduction of the "Versatile Diffusion" model, which comprehensively handles text and images. This model accommodates various tasks, including Text-to-Image, and has demonstrated high performance. https://t.co/NqFLUd3roy