Sepp Hochreiter Unveils xLSTM Advancement in AI Techno

Hear me out xLSTMs combined with KAN and Jamba (Transformer (MoE) x Mamba) and RWKW with a sprinkle of diffusion and graphs https://t.co/XicnJkMmHt

Burny — Effective Omni@burny_tech

2 mo

Hear me out xLSTMs combined with KAN and Jamba (Transformer-Mamba) with a sprinkle of diffusion and graphs https://t.co/r3MQkeh1da

Dr. Pénzes János@drpenzesjanos

2 mo

The Inventor of LSTM Unveils New Architecture for LLMs to Replace Transformers https://t.co/bRsReJD2UY https://t.co/8w3h93bZdo

JJ@JosephJacks_

2 mo

LSTMs all the way down... well done @HochreiterSepp (latest) and @SchmidhuberAI (originally). https://t.co/MbJ4NwjzpQ

Nikolai Yakovenko@ivan_bezdomny

2 mo

While most of us are pretty good with Transformers, some are trying to bring LSTM back to life, and with encouraging results! https://t.co/YADrVDSmmV https://t.co/Tii4rI3jG8

Chris Piech@chrispiech

2 mo

Who is going to upgrade Deep Knowledge Tracing to using xLSTM? I can't wait. I bet @duolingo would be interested... it could improve bird brain. Exciting times. https://t.co/O0PbrI37W2

antisense.@razoralign

2 mo

An XLSTM architecture is constructed by residually stacking building blocks. An xLSTM block should non-linearly summarize the past in a high-dimensional space. Separating histories is the prerequisite to correctly predict the next sequence element such as the next token.

antisense.@razoralign

2 mo

XLSTM is based on a matrix memory. Lack of parallelizability due to memory mixing, i.e., the hidden-hidden connections between hidden states from one time step to the next, which enforce sequential processing.

Ethan Caballero is busy@ethanCaballero

2 mo

Transformers and Mamba are dead. LSTMs are back. "xLSTM: Extended Long Short-Term Memory" https://t.co/cFRVAuSB3B https://t.co/m7eW1AZRJh

elvis@omarsar0

2 mo

xLSTM: Extended Long Short-Term Memory Attempts to scale LSTMs to billions of parameters using the latest techniques from modern LLMs and mitigating common limitations of LSTMs. To enable LSTMs the ability to revise storage decisions, they introduce exponential gating and a new… https://t.co/Kby7nW9nnB

MONTREAL.AI@Montreal_AI

2 mo

xLSTM: Extended Long Short-Term Memory Beck et al.: https://t.co/o1thn5IRvf #ArtificialIntelligence #DeepLearning #MachineLearning https://t.co/MTkyQ43Oma

Alexia Jolicoeur-Martineau@jm_alexia

2 mo

Currently xLSTM is 4x slower than FlashAttention and Mamba, but if this is figured out with better cuda kernels, we would have a model linear in seq_len that is as strong and fast as transformers!!! https://t.co/yldCqppRX9

Alex Yanko 🇺🇦@LeopolisDream

2 mo

"I'll be back" LSTM xLSTM: Extended Long Short-Term Memory https://t.co/WXRbkYbWSO

Rohan Paul@rohanpaul_ai

2 mo

"The scaling behavior indicates that for larger models xLSTM will continue to perform favourable compared to Transformers and State-Space models." https://t.co/qhEBdetXVv

Rohan Paul@rohanpaul_ai

2 mo

MASSIVE: xLSTM has just been published -- putting LSTM on turbo boost 🔥 Key technique - A) Exponential gating and enhanced (cell state) memory capacities. B) Modifying the LSTM memory structure, obtaining: (i) sLSTM with a scalar memory, a scalar update, and new memory… https://t.co/URfRs8FY4l

yi 🦛@agihippo

2 mo

Xlstm paper is actually pretty cool. Hope it works!

andrew gao@itsandrewgao

2 mo

no code/weights shared yet for the #xLSTM, so I tried implementing mLSTM myself! Colab notebook 👇👇🔗 in comments LMK if you want the sLSTM as well. https://t.co/kR5VHgTs3K https://t.co/viQ9Fwwt9a

Andrew Trask@iamtrask

2 mo

A long-suspected theory: LSTMs are superior to Transformers but we didn't know how to scale them as well — so Transformers pulled ahead because they were easier to scale. Enter xLSTM... very interesting work. Key diagrams below https://t.co/oWVMkL7Jf9 https://t.co/ix8APG9e2j

/MachineLearning@slashML

2 mo

xLSTM: Extended Long Short-Term Memory https://t.co/Vdgq4OZuEe

Ben (e/treats)@andersonbcdefg

2 mo

LSTMs are back https://t.co/jI1u90tWlI

andrew gao@itsandrewgao

2 mo

are LSTMs back, and here to kill Transformers? AI Legend Sepp Hochreiter just dropped a new paper on XLSTMs... https://t.co/viQ9Fwwt9a

andrew gao@itsandrewgao

2 mo

🔔the guy who invented the LSTM just dropped a new LLM architecture! (Sepp Hochreiter) Major component is a new parallelizable LSTM. ⚠️one of the major weaknesses of prior LSTMs was the sequential nature (can't be done at once) Everything we know about the XLSTM: 👇👇🧵 https://t.co/C8hrjGPvHg

Aran Komatsuzaki@arankomatsuzaki

2 mo

xLSTM: Extended Long Short-Term Memory Exponential gating and modified memory structures boost xLSTM capabilities to perform favorably when compared to SotA Transformers and State Space Models, both in performance and scaling. https://t.co/HW77SEohJA https://t.co/7DNRpHYfJT

Sepp Hochreiter@HochreiterSepp

2 mo

I am so excited that xLSTM is out. LSTM is close to my heart - for more than 30 years now. With xLSTM we close the gap to existing state-of-the-art LLMs. With NXAI we have started to build our own European LLMs. I am very proud of my team. https://t.co/IH7giCe3gd

Joey (e/λ)@shxf0072

2 mo

transformer killer just dropped xLSTM: Extended Long Short-Term Memory paper: https://t.co/ijHOMnU8RH https://t.co/TWBgQo0dbk

Tanishq Mathew Abraham, Ph.D.@iScienceLuvr

2 mo

xLSTM: Extended Long Short-Term Memory abs: https://t.co/9sqkaXvNmi Leveraging the latest techniques from modern LLMs, mitigating known limitations of LSTMs (introducing sLSTM and mLSTM memory cells that form the xLSTM blocks), and scaling up results in a highly competitive… https://t.co/lFEHPJ6J4H

Johannes Brandstetter@jo_brandstetter

2 mo

xLSTM is out -- putting LSTM networks on steroids to become a more than serious LLM competitor. How? Via exponential gating and enhanced (cell state) memory capacities. Does it work? Oh, yeah 🚀🚀 https://t.co/77RkLHAWVU

Similar Stories

Sepp Hochreiter Unveils xLSTM Advancement in AI Technology for LLMs

Similar Stories

Sources

Sepp Hochreiter Unveils xLSTM Advancement in AI Technology for LLMs