Hybrid State Space Models Samba 3.8B, Mamba-2-Hybrid O

The recent "Samba: Simple Hybrid State Space Models" by Microsoft look great! Basically a Mamba/transformer hybrid with MLP layers and sliding window attention. And it performs really well on a small <2B scale. Proud to see that that our LitGPT open-source library powered this! https://t.co/K7E1wygjeM

fly51fly@fly51fly

15 d

[CL] Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling L Ren, Y Liu, Y Lu, Y Shen… [Microsoft] (2024) https://t.co/rF595l5f5K - SAMBA combines selective SSM layers (Mamba) with sliding window attention (SWA) to efficiently model sequences… https://t.co/zBJMfNSnnK

Louie Peters@_LouiePeters

16 d

Is Samba a production ready non-transformer 3.8BN Large Language Model? Much of the takeoff in AI the past few years has been built on transformer and diffusion models. But a question is - how much of this progress is due to the capabilities of these architectures, relative to… https://t.co/T1OjeYT17n

Albert Gu@_albertgu

17 d

careful systematic study of Mamba vs Transformer capabilities -- hybrid (~10% attn) wins again, where the attn seems to have specific roles (e.g. adhering to multiple choice format, copying abilities) fun collaboration with Nvidia! https://t.co/NQ9RXrU51Y

nat://TheAIObserverX@TheAIObserverX

17 d

An Empirical Study of Mamba-based Language Models ◼ 🚀 New research pits Mamba models against Transformers in a head-to-head! Mamba models, while excelling in efficiency and some language tasks, fall short in tasks needing strong in-context learning. The Mamba2-Hybrid not only… https://t.co/G1ZHwtVsBL

Rohan Paul@rohanpaul_ai

17 d

8B-parameter Mamba-2-Hybrid exceeds the 8B-parameter Transformer on all 12 standard tasks we evaluated (+2.65 points on average) and is predicted to be up to 8× faster when generating tokens at inference time. 🤯 📌 The hybrid model also demonstrates strong long-context… https://t.co/DzDPihcglx

nat://TheAIObserverX@TheAIObserverX

17 d

Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling ◼ 🚀 Introducing Samba: a groundbreaking hybrid model combining Mamba (a State Space Model) with Sliding Window Attention for efficient sequence processing. 🧠 Achieves superior performance… https://t.co/C98uOfFW7a

fly51fly@fly51fly

17 d

[LG] An Empirical Study of Mamba-based Language Models R Waleffe, W Byeon, D Riach, B Norick, V Korthikanti, T Dao, A Gu, A Hatamizadeh, S Singh… [NVIDIA] (2024) https://t.co/KaQR6E6aMb - This paper presents a large-scale comparison between Mamba, Mamba-2, Mamba-2-Hybrid, and… https://t.co/PwN9JiddvN

Tri Dao@tri_dao

18 d

The most direct and detailed comparison of Transformer, Mamba, and hybrid models so far: same size (8B), datasets (1.1T & 3.5T), hparams. Mamba can match / beat Transformer on zero-shot evals, but lag behind on MMLU / copying. Hybrid outperforms Transformer. Some fun results: https://t.co/ipKWljCaJU

Bryan Catanzaro@ctnzr

18 d

A 8B-3.5T hybrid SSM model gets better accuracy than an 8B-3.5T transformer trained on the same dataset: * 7% attention, the rest is Mamba2 * MMLU jumps from 50 to 53.6% * Training efficiency is the same * Inference cost is much less https://t.co/x62otbC5uN https://t.co/bBfFYEt0a0

Liliang Ren@liliang_ren

18 d

Introducing Samba 3.8B, a simple Mamba+Sliding Window Attention architecture that outperforms Phi3-mini on major benchmarks (e.g., MMLU, GSM8K and HumanEval) by a large margin.😮 And it has an infinite context length with linear complexity.🤯 Paper: https://t.co/6OnfGG71Aj… https://t.co/f4IZdT1wGB

Nando de Freitas 🏳️‍🌈@NandoDF

18 d

9 billion parameters State Space Model (SSM) alternative to attention is out. Recurrent transformers are now on par with attention transformers, like Gemma and Mistral, but by maintaining a state vector they can be capable of faster inference. https://t.co/eWbyQk58Cy

Similar Stories

Hybrid State Space Models Samba 3.8B, Mamba-2-Hybrid Outperform Transformers

Similar Stories

Sources

Hybrid State Space Models Samba 3.8B, Mamba-2-Hybrid Outperform Transformers