GPT-4 models have demonstrated performance at or above human levels in certain Theory of Mind (ToM) tasks, such as identifying indirect requests, false beliefs, and misdirection, but struggled with detecting faux pas. Specifically, GPT-4 surpassed adult human performance in 6th Order ToM inferences, suggesting that increased model size, instruction fine-tuning, multimodal capabilities, and word comprehension contribute to its ability to model mental states. Despite these achievements, there are inconsistencies in benchmarking LLMs' intelligence, as some tests show a significant gap between GPT-4 and human performance, potentially due to differences in testing structures or prompting methods.
🧠"Thinking at a Distance" in the Age of AI LLMs, with their vast corpora and speed, redefine the essence of cognition. The extraordinary rise of large language models (LLMs) has exposed a curious split between human and artificial intelligence when it comes to processing… https://t.co/vT1AkHkfyf
Using #ChatGPT in the Development of Clinical Reasoning Cases: A Qualitative Study https://t.co/PrUhrGZJky
LLMs “intelligence” is hard to benchmark, as we don’t have good benchmarks for human performance at complex tasks. Take theory-of-mind: several tests found GPT-4 beats humans, but another one finds a huge gap. Is it the testing structure? Prompting? Which is right? Hard to know. https://t.co/z9L3stRCDP
"GPT-4 exhibits higher-order ToM at the level of adult humans. The best-performing LLMs have a capacity for ToM. Given the role that ToM plays in a wide range of behaviours, significant implications" Winnie Street & Google team https://t.co/w2MbVJCZp8 #AIIntelligence
⁉️ Let's check how GPT-4o, Gemini, Llama3, Mixtral, and Claude perform on theory of mind, shall we?🌟We report new results on Benchmark FANToM👻 - GPT-4o tops the chart by finally achieving score of 2.0/100 (vs. Human 87.5) - Huge boost for Gemini-1.5-flash compared to… https://t.co/rtuNsfEyIF https://t.co/GZWidSdIYs
GPT-4 just surpassed adult human performance at THEORY OF MIND tasks (basically, mind reading) And this is just GPT-4! Imagine GPT-5... and, if we're still here to see it, GPT-6. Soon, AIs will think we're as slow as plants. PLANTS. And… um… most people don't know this, but… https://t.co/zE0GOrCtsD https://t.co/QDkQ3ETBGt
GPT-4 demonstrated superior performance over humans in 6th Order ToM inferences, suggesting that increased model size, instruction fine-tuning, multimodal capabilities, and word comprehension - or interplay among all of these - contribute to its ability to model mental states. https://t.co/k1B0veyiog https://t.co/SR8JOzFBDo
Testing theory of mind in large language models and humans GPT-4 models performed at, or even sometimes above, human levels at identifying indirect requests, false beliefs and misdirection, but struggled with detecting faux pas. https://t.co/djwvSQ2XoX