Microsoft Research has introduced a novel approach to training language models called 'Instruction Pre-Training.' This method involves fine-tuning a language model to generate instruction-response pairs from raw text and incorporating these pairs into the pretraining dataset. The approach has been applied to general and domain-specific models, including medicine and finance. Notably, an 8B model using this technique has demonstrated performance comparable to the 70B Llama 3 model. Microsoft has released model checkpoints of 500M, 1.5B, and 8B, with the 8B checkpoint outperforming the Llama 3 70B. The pretraining data is refined from web sources and instruction-tuned. The models, datasets, and demos are available openly. Additionally, a fine-tuned Mistral 7B model was used as an instruction synthesizer. The research was conducted by D Cheng, Y Gu, S Huang, J Bi, M Huang, and F Wei from Microsoft Research.
[CL] Instruction Pre-Training: Language Models are Supervised Multitask Learners D Cheng, Y Gu, S Huang, J Bi, M Huang, F Wei [Microsoft Research] (2024) https://t.co/eQWRVlHD5w - The paper proposes Instruction Pre-Training, which augments raw corpora with instruction-response… https://t.co/0EexedclML
Yo! Microsoft Research dropped InstructLM ⚡ > 500M & 1.5B pre-trained model checkpoints > Domain-specific 8B checkpoints > 8B checkpoints beat Llama 3 70B > Smol LLMs based on Mistral architecture > Pretraining data -> Randomly sample refined web and create instruction tuned… https://t.co/91cx8AHt3R
Microsoft releases interesting new way of training language models called "Instruction Pre-Training." > Fine-tuned Mistral 7B as "instruction synthesizer" to create a bunch of synth instruction-response pairs and mix them in with regular corpora. > They tried it out on general… https://t.co/dSQEnzYu7e
Microsoft just silently (again!) dropped Instruction Pre-TrainingL 👀Augment pretraining datasets generating instructions 🦙A Llama 3 8B with comparable performance to 70B! 🔥General+domain models (medicine/finance) Models, dataset and demo are open https://t.co/dq3LYWiL4s
Instruction Pre-Training: Language Models are Supervised Multitask Learners abs: https://t.co/zWWXhfy2yP models: https://t.co/aWSp9gYB0G Finetune an LM to generate instruction-response pairs from raw text, apply to the pretraining dataset, pretrain on the synthetic… https://t.co/0N6HDfLOqR