> Use QwQ to generate completions > Use GPT 4o mini to format the outputs > Remove samples that get the incorrect answer > Standard SFT on 17k samples > 19 hours on 8xH100 ($450) Big reason why OpenAI refuses to release the o1 chain of thought https://t.co/4XgGChqdPf https://t.co/CibqrDiTOx
