Several breakthroughs in video generation technology have been announced recently. Projects like Video-Infinity and Long Video Assistant (LongVA) are revolutionizing long video generation by processing thousands of frames or visual tokens, achieving state-of-the-art performance among 7B-scale models. Tencent and Huawei also introduced Text-Animator for controllable visual text video generation, demonstrating superiority over existing methods. Real-time video generation with features like lossless quality and training-free capabilities has also been achieved.
π¨πππ±π-ππ§π’π¦πππ¨π«: Controllable Visual Text Video Generation πππ«π¨π£: https://t.co/w0IN7cZ9ak ππππ¬: https://t.co/4epuSFo8r8 Improve the stability of generated visual text by controlling the camera movement as well as the motion of visualized text https://t.co/fCQuqfae0M
π¨ππ¨ππ’π¨π§ππ¨π¨ππ‘: Motion-Aware Customized Text-to-Video Generation πππ«π¨π£: https://t.co/hPNkvQBgGP ππππ¬: https://t.co/wl7UONW0HE Animating customized subjects with precise control over both object and camera movements. https://t.co/D1DGJjXqa2
MotionBooth is an innovative framework designed for animating customized subjects with precise control over both object and camera movements. Paper: MotionBooth: Motion-Aware Customized Text-to-Video Generation Link: https://t.co/dZcHTg0ifh Project: https://t.co/bGOuIVyki6 #AI⦠https://t.co/B7Rc671tyJ
While recent advances in text-to-image (T2I) visual text generation show promise, transitioning these techniques into the video domain faces problems, notably in preserving textual fidelity and motion coherence. This paper proposes an innovative approach termed Text-Animator for⦠https://t.co/Vz19GR2gZI
simple but smart way to achieve real-time video generation!!! Training-free * almost no performance drop -> real-time DiT video gen (over 20 fps) I do like this analyzing process to invent this tech: 1) Found the attention patterns are extremely similar in nearby diffusion⦠https://t.co/3Obw2AJr1I
πOur #LongVA is now the **best** open-source video LLM among the 7B-scale modelsπ * Long-context capability of processing 2000+ frames or over 200K visual tokens - Code: https://t.co/Jb7P5F59Bf - Blog: https://t.co/FMxFY4dIEx - Demo @huggingface : https://t.co/BYyiXj8pc2 https://t.co/V1wGg50iZp https://t.co/fUuRMiFqbo
Real-Time Video Generation: Achieved π₯³ Share our latest work with @JxlDragon, @VictorKaiWang1, and @YangYou1991: "Real-Time Video Generation with Pyramid Attention Broadcast." 3 features: real-time, lossless quality, and training-free! Blog: https://t.co/e6nTwd5J0L (π§΅1/6) https://t.co/tPvBvSvcMp
Tencent and Huawei present Text-Animator: Controllable Visual Text Video Generation Demonstrates the superiority of their approach to the accuracy of generated visual text over state-of-the-art video generation methods abs: https://t.co/hXR7QRC2xz proj: https://t.co/QDqF9xKhnS https://t.co/r4gTmy7X3i
Video-Infinity: Distributed Long Video Generation Can generate super long videos, up to 2300 frames within 5 mins by Clip parallelism and Dual-scope attention proj: https://t.co/7ywX3pY4h9 abs: https://t.co/CUkAGG8RSn repo: https://t.co/PdR05xhyrj https://t.co/qWrREC09vL
π₯Long Context from Langugae to Visionπ₯ #LongVA can process 2000 frames or over 200K visual tokens with SoTA performance on Video-MME among 7B models - Paper: https://t.co/iCVi2EISeB - Code: https://t.co/Jb7P5F59Bf - Demo @Gradio: https://t.co/BYyiXj8pc2 . Thanks to @_akhaliq! https://t.co/ZBdWx4HrlG
π€©Long Video Assistant (LongVA): Breakthrough in long π₯video understanding! - Transfers long context capability from language to vision π§ - Only opensource model supporting 384 input framesπ€© - Handles 2000+ frames (200K+ visual tokens) π€― - SoTA on Video-MME among 7B models -β¦ https://t.co/GH4g0q9hhV
Long Context Transfer from Language to Vision - Can process 2000 frames or over 200K visual tokens - SotA perf on VideoMME among 7B-scale models abs: https://t.co/JlXz5TPbVP repo: https://t.co/Nyi6fTS5qh https://t.co/ehRr5V0syo
π Introducing Video-Infinity! Our new distributed framework revolutionizes long video generation. π₯β¨ π Generate videos up to 2,300 frames in just 5 minutesβ100x faster than previous methods!#AI #Video #AIGC Project Page: https://t.co/QTN0uoxv8f Paper: https://t.co/R7X9QADFzN https://t.co/CzoZiYupj5