Recent studies have shown that Large Language Models (LLMs) can store 2 bits of knowledge per parameter, with the Physics of Language Models investigating knowledge capacity scaling laws. The research delves into the impact of size, training, architecture, and data quality on LLMs' capacity.
If you are curious how an LLM can be represented in 1 bit (or 1.58 bits in this case) we dove deep into the math and the code during arXiv dives with @oxen_ai. It won’t be long until someone takes this work and optimizes the nn.Linear layer to speed it up further 🔥 https://t.co/gFuh1GF8dA
New study finds Large Language Models store 2 bits of knowledge per parameter, showing how size, training, architecture & data quality affect their capacity: https://t.co/uv1OrquLEk https://t.co/ncblHYO89C
The Physics of Language Models Investigates knowledge capacity scaling laws where it evaluates a model’s capability via loss or benchmarks, to estimate the number of knowledge bits a model stores. Quote from the paper: "Language models can and only can store 2 bits of knowledge… https://t.co/koFMZJPq4t
[CL] Language Model Evolution: An Iterated Learning Perspective Y Ren, S Guo, L Qiu, B Wang, D J. Sutherland [University of British Columbia & University of Edinburgh & MIT] (2024) https://t.co/YEyjmQrhUf - Iterated learning (IL) framework can explain behaviors of LLMs engaged… https://t.co/0N1J4uyy8I
[CL] Physics of Language Models: Part 3.3, Knowledge Capacity Scaling Laws Z Allen-Zhu, Y Li [Meta & Mohamed bin Zayed University of AI] (2024) https://t.co/JOVi40OZ4P - Language models can store 2 bits of knowledge per parameter when trained with around 1000 exposures to each… https://t.co/v88rbdFTTp