The introduction of xLSTM, Extended Long Short-Term Memory, marks a significant advancement in AI technology. Developed by Sepp Hochreiter, the inventor of LSTM, xLSTM addresses limitations of traditional LSTMs by incorporating new memory cells and scaling techniques. With features like exponential gating and enhanced memory structures, xLSTM competes favorably with State-of-the-Art Transformers and State Space Models, offering improved performance and scalability.
Hear me out xLSTMs combined with KAN and Jamba (Transformer (MoE) x Mamba) and RWKW with a sprinkle of diffusion and graphs https://t.co/XicnJkMmHt
Hear me out xLSTMs combined with KAN and Jamba (Transformer-Mamba) with a sprinkle of diffusion and graphs https://t.co/r3MQkeh1da
The Inventor of LSTM Unveils New Architecture for LLMs to Replace Transformers https://t.co/bRsReJD2UY https://t.co/8w3h93bZdo
LSTMs all the way down... well done @HochreiterSepp (latest) and @SchmidhuberAI (originally). https://t.co/MbJ4NwjzpQ
While most of us are pretty good with Transformers, some are trying to bring LSTM back to life, and with encouraging results! https://t.co/YADrVDSmmV https://t.co/Tii4rI3jG8
Who is going to upgrade Deep Knowledge Tracing to using xLSTM? I can't wait. I bet @duolingo would be interested... it could improve bird brain. Exciting times. https://t.co/O0PbrI37W2
An XLSTM architecture is constructed by residually stacking building blocks. An xLSTM block should non-linearly summarize the past in a high-dimensional space. Separating histories is the prerequisite to correctly predict the next sequence element such as the next token.
XLSTM is based on a matrix memory. Lack of parallelizability due to memory mixing, i.e., the hidden-hidden connections between hidden states from one time step to the next, which enforce sequential processing.
Transformers and Mamba are dead. LSTMs are back. "xLSTM: Extended Long Short-Term Memory" https://t.co/cFRVAuSB3B https://t.co/m7eW1AZRJh
xLSTM: Extended Long Short-Term Memory Attempts to scale LSTMs to billions of parameters using the latest techniques from modern LLMs and mitigating common limitations of LSTMs. To enable LSTMs the ability to revise storage decisions, they introduce exponential gating and a new… https://t.co/Kby7nW9nnB
xLSTM: Extended Long Short-Term Memory Beck et al.: https://t.co/o1thn5IRvf #ArtificialIntelligence #DeepLearning #MachineLearning https://t.co/MTkyQ43Oma
Currently xLSTM is 4x slower than FlashAttention and Mamba, but if this is figured out with better cuda kernels, we would have a model linear in seq_len that is as strong and fast as transformers!!! https://t.co/yldCqppRX9
"I'll be back" LSTM xLSTM: Extended Long Short-Term Memory https://t.co/WXRbkYbWSO
"The scaling behavior indicates that for larger models xLSTM will continue to perform favourable compared to Transformers and State-Space models." https://t.co/qhEBdetXVv
MASSIVE: xLSTM has just been published -- putting LSTM on turbo boost 🔥 Key technique - A) Exponential gating and enhanced (cell state) memory capacities. B) Modifying the LSTM memory structure, obtaining: (i) sLSTM with a scalar memory, a scalar update, and new memory… https://t.co/URfRs8FY4l
Xlstm paper is actually pretty cool. Hope it works!
no code/weights shared yet for the #xLSTM, so I tried implementing mLSTM myself! Colab notebook 👇👇🔗 in comments LMK if you want the sLSTM as well. https://t.co/kR5VHgTs3K https://t.co/viQ9Fwwt9a
A long-suspected theory: LSTMs are superior to Transformers but we didn't know how to scale them as well — so Transformers pulled ahead because they were easier to scale. Enter xLSTM... very interesting work. Key diagrams below https://t.co/oWVMkL7Jf9 https://t.co/ix8APG9e2j
xLSTM: Extended Long Short-Term Memory https://t.co/Vdgq4OZuEe
LSTMs are back https://t.co/jI1u90tWlI
are LSTMs back, and here to kill Transformers? AI Legend Sepp Hochreiter just dropped a new paper on XLSTMs... https://t.co/viQ9Fwwt9a
🔔the guy who invented the LSTM just dropped a new LLM architecture! (Sepp Hochreiter) Major component is a new parallelizable LSTM. ⚠️one of the major weaknesses of prior LSTMs was the sequential nature (can't be done at once) Everything we know about the XLSTM: 👇👇🧵 https://t.co/C8hrjGPvHg
xLSTM: Extended Long Short-Term Memory Exponential gating and modified memory structures boost xLSTM capabilities to perform favorably when compared to SotA Transformers and State Space Models, both in performance and scaling. https://t.co/HW77SEohJA https://t.co/7DNRpHYfJT
I am so excited that xLSTM is out. LSTM is close to my heart - for more than 30 years now. With xLSTM we close the gap to existing state-of-the-art LLMs. With NXAI we have started to build our own European LLMs. I am very proud of my team. https://t.co/IH7giCe3gd
transformer killer just dropped xLSTM: Extended Long Short-Term Memory paper: https://t.co/ijHOMnU8RH https://t.co/TWBgQo0dbk
xLSTM: Extended Long Short-Term Memory abs: https://t.co/9sqkaXvNmi Leveraging the latest techniques from modern LLMs, mitigating known limitations of LSTMs (introducing sLSTM and mLSTM memory cells that form the xLSTM blocks), and scaling up results in a highly competitive… https://t.co/lFEHPJ6J4H
xLSTM is out -- putting LSTM networks on steroids to become a more than serious LLM competitor. How? Via exponential gating and enhanced (cell state) memory capacities. Does it work? Oh, yeah 🚀🚀 https://t.co/77RkLHAWVU