Recent research in the field of image and text generation has seen advancements in autoregressive models compared to diffusion models. A new autoregressive model, Llama, has achieved a FID score of 2.18 on ImageNet 256×256 benchmarks, surpassing popular diffusion models. Additionally, a unified framework within multimodal LLMs has been proposed for text-to-image generation and retrieval. The discussion around the acceleration of diffusion models through block caching to improve speed without compromising quality is ongoing.
Is absorbing discrete diffusion making a comeback? Two concurrent papers with promising results: Simple and Effective Masked Diffusion Language Models https://t.co/9LEwNkj4YU Simplified and Generalized Masked Diffusion for Discrete Data https://t.co/tYE9JMR5VU (1/2)
Discrete diffusion models made simple & competitive on both language and pixel-level image modeling! https://t.co/G3AnUVQzOD ✅New variational objective (integrate cross-entropy!) ✅Beating prior diffusion language models & matching best AR on pixel-level image modeling ...(1/n) https://t.co/Aw8R9mgJ5l
[CL] Simple and Effective Masked Diffusion Language Models S S Sahoo, M Arriola, Y Schiff, A Gokaslan… [Cornell Tech] (2024) https://t.co/E2QD1swyju - The paper shows that simple masked discrete diffusion can achieve strong performance for language modeling, approaching… https://t.co/62HnBWMYXg
Image Neural Field Diffusion Models abs: https://t.co/qKHSZ5TBaB project page: https://t.co/XWepguHnoN New paper from Adobe Research: Training latent diffusion models on neural image fields, which can render images at any resolution. First an autoencoder is trained to take in… https://t.co/2GI5OASXlt
Simple and Effective Masked Diffusion Language Models While diffusion models excel at generating high-quality images, prior work reports a significant performance gap between diffusion and autoregressive (AR) methods in language modeling. In this work, we show that simple https://t.co/rQjzwzS4Ql
Simple and Effective Masked Diffusion Language Models abs: https://t.co/rdn2FhdMxO code: https://t.co/7Klk6bcNdY Uses a well-engineered implementation of a simple masked diffusion language model framework to achieve a new state-of-the-art among diffusion models on language… https://t.co/cXWRfHJJxA
Simple and Effective Masked Diffusion Language Models Achieves a new SotA among diffusion models on a range of LM tasks and approaches AR perplexity repo: https://t.co/97uIi2my8I abs: https://t.co/R82RQBmLEI https://t.co/19VV5iuMCY
The Future of Image Models Is Multimodal https://t.co/U8fK7vsChu < a good discussion w/ @mo_norouzi and @JenniferHli about the genesis of LLMs and image models, and where it's all headed
Diffusion models are often prohibitively slow. 😢 In our #CVPR2024 paper "Accelerating Diffusion Models through Block Caching", we present a technique to make SOTA models like LDM up to 1.8x faster without sacrificing quality.🚀 Check out our video here: https://t.co/3Ee2bn9Rm7 https://t.co/yiHr9XOTwh
This is really cool. Basically an image generator model that replaces diffusion based models with LLM. Like, generating a single model to generate both text and image, and potentially even things like audio. https://t.co/kqHw8KXvdp
Unified Text-to-Image Generation and Retrieval Proposes a unified framework within multimodal LLMs that performs both text-to-image generation and retrieval in an autoregressive manner, along with a new benchmark. 📝https://t.co/rEXVde9aQd 👨🏽💻https://t.co/uXdsb72aFn https://t.co/4i7IJ0QILS
Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation Achieves 2.18 FID on ImageNet 256×256 benchmarks, outperforming the popular diffusion models such as LDM, DiT proj: https://t.co/nhPzojQDR6 abs: https://t.co/ePiLSBUzbV https://t.co/lockz1ZOvK