Physics of Language Models: Study Reveals LLMs Store 2

If you are curious how an LLM can be represented in 1 bit (or 1.58 bits in this case) we dove deep into the math and the code during arXiv dives with @oxen_ai. It won’t be long until someone takes this work and optimizes the nn.Linear layer to speed it up further 🔥 https://t.co/gFuh1GF8dA

Emergent Mind Bot@EmergentMind

3 mo

New study finds Large Language Models store 2 bits of knowledge per parameter, showing how size, training, architecture & data quality affect their capacity: https://t.co/uv1OrquLEk https://t.co/ncblHYO89C

elvis@omarsar0

3 mo

The Physics of Language Models Investigates knowledge capacity scaling laws where it evaluates a model’s capability via loss or benchmarks, to estimate the number of knowledge bits a model stores. Quote from the paper: "Language models can and only can store 2 bits of knowledge… https://t.co/koFMZJPq4t

fly51fly@fly51fly

3 mo

[CL] Language Model Evolution: An Iterated Learning Perspective Y Ren, S Guo, L Qiu, B Wang, D J. Sutherland [University of British Columbia & University of Edinburgh & MIT] (2024) https://t.co/YEyjmQrhUf - Iterated learning (IL) framework can explain behaviors of LLMs engaged… https://t.co/0N1J4uyy8I

fly51fly@fly51fly

3 mo

[CL] Physics of Language Models: Part 3.3, Knowledge Capacity Scaling Laws Z Allen-Zhu, Y Li [Meta & Mohamed bin Zayed University of AI] (2024) https://t.co/JOVi40OZ4P - Language models can store 2 bits of knowledge per parameter when trained with around 1000 exposures to each… https://t.co/v88rbdFTTp

Similar Stories

Physics of Language Models: Study Reveals LLMs Store 2 Bits of Knowledge per Parameter

Similar Stories

Sources

Physics of Language Models: Study Reveals LLMs Store 2 Bits of Knowledge per Parameter