Tri Dao and Albert Gu Introduce Mamba-2: A Faster, Mor

Tanishq Mathew Abraham, Ph.D.@iScienceLuvr

STARTING NOW! List of papers we'll be covering: Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark (special appearance by authors)… https://t.co/FCJzxemzhi

Vlad Ruso PhD@vlruso

27 d

Beyond Quadratic Bottlenecks: Mamba-2 and the State Space Duality Framework for Efficient Language Modeling https://t.co/P0lOuUBlBc #LanguageModeling #Mamba2 #SSDframework #AIbusinesssolutions #AIsalesbot #ai #news #llm #ml #research #ainews #innovation #artificialintelligenc… https://t.co/8ZjgGcIfrb

The AI Timeline@TheAITimeline

27 d

Mamba got a sequel: Mamba-2 Six months ago, the Mamba research team Tri Dao and Albert Gu introduced their new model architecture. The community loved it. It has been examined on recall abilities, in-context learning, and formal language expressivity. So what's new? 🧵 https://t.co/9AhOfshdTF

Gradient Flow@GradFlowTech

27 d

Mamba-2 and State Space Models https://t.co/4BZ95eBBhb

Marktechpost AI Research News ⚡@Marktechpost

27 d

Beyond Quadratic Bottlenecks: Mamba-2 and the State Space Duality Framework for Efficient Language Modeling Researchers from Princeton University and Carnegie Mellon University have introduced the State Space Duality (SSD) framework, which connects SSMs and attention mechanisms.… https://t.co/ccf5FRfxjz

Machine Learning@Memoirs

28 d

Mamba State-Space Models Can Be Strong Downstream Learners. https://t.co/pzXO7ZjMZd

/MachineLearning@slashML

29 d

Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality https://t.co/4qDhFboeLY

Karan Goel@krandiash

29 d

mamba-2 is here 👀 if you want to work on bleeding edge ssms with a world class research team led by @_albertgu, we're hiring @cartesia_ai https://t.co/bJlwCsnTAH

Albert Gu@_albertgu

29 d

excited to finally release Mamba-2!! 8x larger states, 50% faster training, and even more S's 🐍🐍 Mamba-2 aims to advance the theory of sequence models, developing a framework of connections between SSMs and (linear) attention that we call state space duality (SSD) w/@tri_dao https://t.co/xbNMzMeYL8

Tri Dao@tri_dao

29 d

With @_albertgu, we’ve built a rich theoretical framework of state-space duality, showing that many linear attn variants and SSMs are equivalent! The resulting model, Mamba-2 is better & faster than Mamba-1, and still matching strong Transformer arch on language modeling. 1/ https://t.co/mqDwiYeSAl

Martin Shkreli (e/acc)@MartinShkreli

29 d

Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality Tri Dao, Albert Gu https://t.co/cjgo9q65UN

fly51fly@fly51fly

29 d

[LG] Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality T Dao, A Gu [Princeton University & CMU] (2024) https://t.co/vkSNhzVgeq - This paper shows theoretical connections between structured state space models (SSMs),… https://t.co/1BwiroVH3l

AK@_akhaliq

30 d

Transformers are SSMs Generalized Models and Efficient Algorithms Through Structured State Space Duality While Transformers have been the main architecture behind deep learning's success in language modeling, state-space models (SSMs) such as Mamba have recently been shown https://t.co/9400BttLNR

Aran Komatsuzaki@arankomatsuzaki

30 d

Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality Presents Mamba-2, which outperforms Mamba and Transformer++ in both perplexity and wall-clock time https://t.co/Sd3J3kPG5W https://t.co/C2nAisXcoN

Similar Stories

Tri Dao and Albert Gu Introduce Mamba-2: A Faster, More Efficient State-Space Model

Similar Stories

Sources

Tri Dao and Albert Gu Introduce Mamba-2: A Faster, More Efficient State-Space Model