Microsoft Unveils Instruction Pre-Training, 8B Model O

[CL] Instruction Pre-Training: Language Models are Supervised Multitask Learners D Cheng, Y Gu, S Huang, J Bi, M Huang, F Wei [Microsoft Research] (2024) https://t.co/eQWRVlHD5w - The paper proposes Instruction Pre-Training, which augments raw corpora with instruction-response… https://t.co/0EexedclML

Vaibhav (VB) Srivastav@reach_vb

7 d

Yo! Microsoft Research dropped InstructLM ⚡ > 500M & 1.5B pre-trained model checkpoints > Domain-specific 8B checkpoints > 8B checkpoints beat Llama 3 70B > Smol LLMs based on Mistral architecture > Pretraining data -> Randomly sample refined web and create instruction tuned… https://t.co/91cx8AHt3R

TokenBender (e/xperiments)@4evaBehindSOTA

7 d

Microsoft releases interesting new way of training language models called "Instruction Pre-Training." > Fine-tuned Mistral 7B as "instruction synthesizer" to create a bunch of synth instruction-response pairs and mix them in with regular corpora. > They tried it out on general… https://t.co/dSQEnzYu7e

Omar Sanseviero@osanseviero

7 d

Microsoft just silently (again!) dropped Instruction Pre-TrainingL 👀Augment pretraining datasets generating instructions 🦙A Llama 3 8B with comparable performance to 70B! 🔥General+domain models (medicine/finance) Models, dataset and demo are open https://t.co/dq3LYWiL4s

Tanishq Mathew Abraham, Ph.D.@iScienceLuvr

7 d

Instruction Pre-Training: Language Models are Supervised Multitask Learners abs: https://t.co/zWWXhfy2yP models: https://t.co/aWSp9gYB0G Finetune an LM to generate instruction-response pairs from raw text, apply to the pretraining dataset, pretrain on the synthetic… https://t.co/0N6HDfLOqR

Similar Stories

Microsoft Unveils Instruction Pre-Training, 8B Model Outperforms Llama 3 70B

Similar Stories

Sources

Microsoft Unveils Instruction Pre-Training, 8B Model Outperforms Llama 3 70B