Researchers and academics have recently introduced advancements in the field of Large Language Models (LLMs). Fudan University researchers have developed SpeechGPT-Gen, an 8 billion-parameter Speech Large Language Model (SLLM) that specializes in semantic and perceptual information modeling. H2O-Danube-1.8B, a 1.8 billion-parameter language model, was trained on 1 trillion tokens, incorporating core principles from LLama 2 and Mistral, and showing competitive metrics in the sub-2B model space. A new paper has revisited the problem of extreme LLM compression, targeting low bit counts per parameter and proposing an algorithm for additive quantization. Microsoft's 'SliceGPT' proposes compressing large language models by deleting rows and columns, potentially removing up to 25% of model parameters while maintaining 99%, 99% and 90% performance metrics.
Great overview of compression algorithms for LLMs. Covers compression algorithms like pruning, quantization, knowledge distillation, low-rank approximation, parameter sharing, and efficient architecture design. This space is moving so fast. This is just a nice overview… https://t.co/CQxMgw0Wih
📌 This paper, revisits the problem of “extreme” LLM compression defined as targeting extremely low bit counts, such as 2 to 3 bits per parameter. 🔥 "Extreme Compression of Large Language Models via Additive Quantization" 📌 The resulting algorithm advances the… https://t.co/6IeXaYT0Dq
Very nice proposal in this Paper from Microsoft - "SliceGPT: Compress Large Language Models by Deleting Rows and Columns" 🔥 SliceGPT can remove up to 25% of the model parameters (including embeddings) for LLAMA2-70B, OPT 66B and Phi-2 models while maintaining 99%, 99% and 90%… https://t.co/6D10HxnNpw
Happy to share our first efforts for foundation modeling: H2O-Danube-1.8b A small 1.8b model based on Llama/Mistral architecture trained on only 1T natural language tokens showing competitive metrics across benchmarks in the <2B model space. We particularly hope for the model to…
H2O-Danube-1.8B paper page: https://t.co/pl8Zg3VfmE present H2O-Danube-1.8B, a 1.8B language model trained on 1T tokens following the core principles of LLama 2 and Mistral. We leverage and refine various techniques for pre-training large language models. Although our model is… https://t.co/M25nLQ4Iqo
Fudan University Researchers Introduce SpeechGPT-Gen: A 8B-Parameter Speech Large Language Model (SLLM) Efficient in Semantic and Perceptual Information Modeling Quick read: https://t.co/XAjkEKiUfE Paper: https://t.co/EmKuzqqz3h Github: https://t.co/RgIAseerS1 #artificial… https://t.co/gsxI0h5S2Z