4 posts • ChatGPT (GPT-4o mini)
Published
Recent evaluations of AI models have highlighted the performance of Gemini 1.5 Pro 0827, which achieved a score of 67% on the Aider's code editing benchmark. This places it just slightly above Llama 405b, which scored 66%. In comparison, the Sonnet model led with a score of 77%, while GPT 3.5 Turbo 0301 and Gemini 1.5 Flash 0827 followed with scores of 58% and 53%, respectively. Additionally, Gemini has released another fine-tuned version of its model, which reportedly has only made minor improvements. In a separate assessment of structured output capabilities, Gemini 1.5 was rated as 'OK', while OpenAI's GPT-4o was recognized as the best model due to its direct Pydantic integration. Claude 3.5 was rated second, requiring a 'tool call' trick for optimal performance. Gemini 1.5 Flash was noted for outperforming GPT-4o-mini in most categories, except for coding tasks.