AI pioneer Sepp Hochreiter has introduced a new architecture called xLSTM, or Extended Long Short-Term Memory, which aims to address the limitations of traditional LSTMs and compete with state-of-the-art language models like Transformers. The xLSTM incorporates innovations such as exponential gating, modified memory structures, and introduces sLSTM and mLSTM memory cells, enhancing its performance and scalability. This development is part of a broader effort to advance European language model capabilities under the NXAI initiative. The new model, which includes a parallelizable LSTM, has generated significant interest in the AI community, with discussions around its potential to outperform existing models. However, no code or weights have been shared yet.
Currently xLSTM is 4x slower than FlashAttention and Mamba, but if this is figured out with better cuda kernels, we would have a model linear in seq_len that is as strong and fast as transformers!!! https://t.co/yldCqppRX9
"I'll be back" LSTM xLSTM: Extended Long Short-Term Memory https://t.co/WXRbkYbWSO
Thanks @srush_nlp for this compelling collection of recent RNN-based Language Models! I think now you have to update this list with the #xLSTM 😉 I agree, naming conventions are always hard... In our paper we try to stick to the original LSTM formulation from the 1990s: https://t.co/Xe6R32pNsO https://t.co/prFJA7kPvp
"The scaling behavior indicates that for larger models xLSTM will continue to perform favourable compared to Transformers and State-Space models." https://t.co/qhEBdetXVv
MASSIVE: xLSTM has just been published -- putting LSTM on turbo boost 🔥 Key technique - A) Exponential gating and enhanced (cell state) memory capacities. B) Modifying the LSTM memory structure, obtaining: (i) sLSTM with a scalar memory, a scalar update, and new memory… https://t.co/URfRs8FY4l
Xlstm paper is actually pretty cool. Hope it works!
Did a quick look over the xLSTM paper, unfortunately there might be some problematic experiments. 1. When do scaling laws, experiments need to be FLOP controlled (either theoretical FLOP's or effective FLOP's). Unless LLaMA/Mamba/RWKV/xLSTM are all the same exact FLOP's for the… https://t.co/lgUxidkVvh
no code/weights shared yet for the #xLSTM, so I tried implementing mLSTM myself! Colab notebook 👇👇🔗 in comments LMK if you want the sLSTM as well. https://t.co/kR5VHgTs3K https://t.co/viQ9Fwwt9a
A long-suspected theory: LSTMs are superior to Transformers but we didn't know how to scale them as well — so Transformers pulled ahead because they were easier to scale. Enter xLSTM... very interesting work. Key diagrams below https://t.co/oWVMkL7Jf9 https://t.co/ix8APG9e2j
xLSTM: Extended Long Short-Term Memory https://t.co/Vdgq4OZuEe
Exciting to see @HochreiterSepp following his vision bringing xLSTM to reality. Looking forward to see this further developed! https://t.co/o7l0eylrgi https://t.co/scZ0SLCU37
Wake up, LSTMs are parallelized now. https://t.co/viQ9Fwwt9a
LSTMs are back https://t.co/jI1u90tWlI
are LSTMs back, and here to kill Transformers? AI Legend Sepp Hochreiter just dropped a new paper on XLSTMs... https://t.co/viQ9Fwwt9a
🔔the guy who invented the LSTM just dropped a new LLM architecture! (Sepp Hochreiter) Major component is a new parallelizable LSTM. ⚠️one of the major weaknesses of prior LSTMs was the sequential nature (can't be done at once) Everything we know about the XLSTM: 👇👇🧵 https://t.co/C8hrjGPvHg
xLSTM: Extended Long Short-Term Memory Exponential gating and modified memory structures boost xLSTM capabilities to perform favorably when compared to SotA Transformers and State Space Models, both in performance and scaling. https://t.co/HW77SEohJA https://t.co/7DNRpHYfJT
I am so excited that xLSTM is out. LSTM is close to my heart - for more than 30 years now. With xLSTM we close the gap to existing state-of-the-art LLMs. With NXAI we have started to build our own European LLMs. I am very proud of my team. https://t.co/IH7giCe3gd
transformer killer just dropped xLSTM: Extended Long Short-Term Memory paper: https://t.co/ijHOMnU8RH https://t.co/TWBgQo0dbk
xLSTM: Extended Long Short-Term Memory abs: https://t.co/9sqkaXvNmi Leveraging the latest techniques from modern LLMs, mitigating known limitations of LSTMs (introducing sLSTM and mLSTM memory cells that form the xLSTM blocks), and scaling up results in a highly competitive… https://t.co/lFEHPJ6J4H
xLSTM is out -- putting LSTM networks on steroids to become a more than serious LLM competitor. How? Via exponential gating and enhanced (cell state) memory capacities. Does it work? Oh, yeah 🚀🚀 https://t.co/77RkLHAWVU