At the EMNLP2023 conference, a new transformer variant called Pushdown Transformers, designed to expand the functions Transformers can model, was introduced. It aims to move Transformers up the chomsky hierarchy for better generalization, data efficiency, and improved parses. The technology of 'transformers' in large language models is intellectually exciting, as mentioned by participants in the conference.
I am late to the party but the technology of "transformers" in large language models is intellectually exciting (not just playing with gpt as a toy but understanding the basic mechanism underlying it at the level of a layman)
Transformers are uninterpretable with myopic methods: a case study with bounded Dyck grammars. https://t.co/cJeIHjOIJn
Learn about the advanced version of Transformers' attention mechanism in Vyacheslav Efimov's latest explainer, which focuses on the DeBERTa model. https://t.co/zArNODNn23
Transformers are uninterpretable with myopic methods: a case study with bounded Dyck grammars. (arXiv:2312.01429v1 [cs.LG]) https://t.co/xYYuihjJ1s
Interesting. Trying to get a layman's understanding of transformers as used in language models. https://t.co/eFROepRU4F
At #EMNLP2023: Pushdown Transformers, a new transformer variant with stack memory! Designed to expand the kinds of functions Transformers can model, to move them up the chomsky hierarchy. In practice, better generalization & data efficiency, and great parses.🧵 https://t.co/3HN5QGW1kf