Sepp Hochreiter's xLSTM: A New Rival to Transformers U

Currently xLSTM is 4x slower than FlashAttention and Mamba, but if this is figured out with better cuda kernels, we would have a model linear in seq_len that is as strong and fast as transformers!!! https://t.co/yldCqppRX9

Alex Yanko 🇺🇦@LeopolisDream

2 mo

"I'll be back" LSTM xLSTM: Extended Long Short-Term Memory https://t.co/WXRbkYbWSO

Maximilian Beck@maxmbeck

2 mo

Thanks @srush_nlp for this compelling collection of recent RNN-based Language Models! I think now you have to update this list with the #xLSTM 😉 I agree, naming conventions are always hard... In our paper we try to stick to the original LSTM formulation from the 1990s: https://t.co/Xe6R32pNsO https://t.co/prFJA7kPvp

Rohan Paul@rohanpaul_ai

2 mo

"The scaling behavior indicates that for larger models xLSTM will continue to perform favourable compared to Transformers and State-Space models." https://t.co/qhEBdetXVv

Rohan Paul@rohanpaul_ai

2 mo

MASSIVE: xLSTM has just been published -- putting LSTM on turbo boost 🔥 Key technique - A) Exponential gating and enhanced (cell state) memory capacities. B) Modifying the LSTM memory structure, obtaining: (i) sLSTM with a scalar memory, a scalar update, and new memory… https://t.co/URfRs8FY4l

yi 🦛@agihippo

2 mo

Xlstm paper is actually pretty cool. Hope it works!

Armen Aghajanyan@ArmenAgha

2 mo

Did a quick look over the xLSTM paper, unfortunately there might be some problematic experiments. 1. When do scaling laws, experiments need to be FLOP controlled (either theoretical FLOP's or effective FLOP's). Unless LLaMA/Mamba/RWKV/xLSTM are all the same exact FLOP's for the… https://t.co/lgUxidkVvh

andrew gao@itsandrewgao

2 mo

no code/weights shared yet for the #xLSTM, so I tried implementing mLSTM myself! Colab notebook 👇👇🔗 in comments LMK if you want the sLSTM as well. https://t.co/kR5VHgTs3K https://t.co/viQ9Fwwt9a

Andrew Trask@iamtrask

2 mo

A long-suspected theory: LSTMs are superior to Transformers but we didn't know how to scale them as well — so Transformers pulled ahead because they were easier to scale. Enter xLSTM... very interesting work. Key diagrams below https://t.co/oWVMkL7Jf9 https://t.co/ix8APG9e2j

/MachineLearning@slashML

2 mo

xLSTM: Extended Long Short-Term Memory https://t.co/Vdgq4OZuEe

Bart de Witte@OpenMedFuture

2 mo

Exciting to see @HochreiterSepp following his vision bringing xLSTM to reality. Looking forward to see this further developed! https://t.co/o7l0eylrgi https://t.co/scZ0SLCU37

andrew gao@itsandrewgao

2 mo

Wake up, LSTMs are parallelized now. https://t.co/viQ9Fwwt9a

Ben (e/sqlite)@andersonbcdefg

2 mo

LSTMs are back https://t.co/jI1u90tWlI

andrew gao@itsandrewgao

2 mo

are LSTMs back, and here to kill Transformers? AI Legend Sepp Hochreiter just dropped a new paper on XLSTMs... https://t.co/viQ9Fwwt9a

andrew gao@itsandrewgao

2 mo

🔔the guy who invented the LSTM just dropped a new LLM architecture! (Sepp Hochreiter) Major component is a new parallelizable LSTM. ⚠️one of the major weaknesses of prior LSTMs was the sequential nature (can't be done at once) Everything we know about the XLSTM: 👇👇🧵 https://t.co/C8hrjGPvHg

Aran Komatsuzaki@arankomatsuzaki

2 mo

xLSTM: Extended Long Short-Term Memory Exponential gating and modified memory structures boost xLSTM capabilities to perform favorably when compared to SotA Transformers and State Space Models, both in performance and scaling. https://t.co/HW77SEohJA https://t.co/7DNRpHYfJT

Sepp Hochreiter@HochreiterSepp

2 mo

I am so excited that xLSTM is out. LSTM is close to my heart - for more than 30 years now. With xLSTM we close the gap to existing state-of-the-art LLMs. With NXAI we have started to build our own European LLMs. I am very proud of my team. https://t.co/IH7giCe3gd

Joey (e/λ)@shxf0072

2 mo

transformer killer just dropped xLSTM: Extended Long Short-Term Memory paper: https://t.co/ijHOMnU8RH https://t.co/TWBgQo0dbk

Tanishq Mathew Abraham, Ph.D.@iScienceLuvr

2 mo

xLSTM: Extended Long Short-Term Memory abs: https://t.co/9sqkaXvNmi Leveraging the latest techniques from modern LLMs, mitigating known limitations of LSTMs (introducing sLSTM and mLSTM memory cells that form the xLSTM blocks), and scaling up results in a highly competitive… https://t.co/lFEHPJ6J4H

Johannes Brandstetter@jo_brandstetter

2 mo

xLSTM is out -- putting LSTM networks on steroids to become a more than serious LLM competitor. How? Via exponential gating and enhanced (cell state) memory capacities. Does it work? Oh, yeah 🚀🚀 https://t.co/77RkLHAWVU

Similar Stories

Sepp Hochreiter's xLSTM: A New Rival to Transformers Under NXAI Initiative

Similar Stories

Sources

Sepp Hochreiter's xLSTM: A New Rival to Transformers Under NXAI Initiative