Loading...
Recent research indicates that GPT-4 and Flan-PaLM, large language models (LLMs), have achieved adult-level and near adult-level performance on Theory of Mind (ToM) tasks. Notably, GPT-4 exceeds adult human performance on 6th order inferences. The study, published on May 29th on arXiv, involved 1,440 data points, though some data appeared noisy due to the small number of questions. The findings highlight the potential of LLMs in complex cognitive tasks, although the reliability of benchmarks for such tasks remains a topic of debate.
LLMs “intelligence” is hard to benchmark, as we don’t have good benchmarks for human performance at complex tasks. Take theory-of-mind: several tests found GPT-4 beats humans, but another one finds a huge gap. Is it the testing structure? Prompting? Which is right? Hard to know. https://t.co/z9L3stRCDP
⁉️ Let's check how GPT-4o, Gemini, Llama3, Mixtral, and Claude perform on theory of mind, shall we?🌟We report new results on Benchmark FANToM👻 - GPT-4o tops the chart by finally achieving score of 2.0/100 (vs. Human 87.5) - Huge boost for Gemini-1.5-flash compared to… https://t.co/rtuNsfEyIF https://t.co/GZWidSdIYs
LLMs achieve adult human performance on higher-order theory of mind tasks GPT-4 and Flan-PaLM reach adult-level and near adult-level performance on ToM tasks overall, and that GPT-4 exceeds adult performance on 6th order inferences https://t.co/SsKbu4PbCo
GPT-4 now exceeds humans on theory of mind tasks GPT-4 and Flan-PaLM reach adult-level and near adult-level performance on ToM tasks overall, and that GPT-4 exceeds adult performance on 6th order inferences https://t.co/6aPVli2Cwp
May 29th arXiv📄: "We find that GPT-4 and Flan-PaLM reach adult-level ... on Theory of Mind tasks overall, and that GPT-4 exceeds adult performance on 6th order inferences." 6th order?🤯 BTW, N=1440, but data looks noisy, possibly due to small # of Qs.. https://t.co/do2oSFU8zj https://t.co/g7JG2vAt9M
GPT-4 now exceeds humans at theory of mind tasks 🤯 Paper "LLMs achieve adult human performance on higher-order theory of mind tasks": 📌 The key finding is that GPT-4 and Flan-PaLM reach adult or near-adult level performance on Theory of Mind (ToM) tasks up to the 6th order,… https://t.co/U7yxCKEbNa
GPT-4 just surpassed adult human performance at THEORY OF MIND tasks (basically, mind reading) And this is just GPT-4! Imagine GPT-5... and, if we're still here to see it, GPT-6. Soon, AIs will think we're as slow as plants. PLANTS. And… um… most people don't know this, but… https://t.co/zE0GOrCtsD https://t.co/QDkQ3ETBGt
GPT-4 demonstrated superior performance over humans in 6th Order ToM inferences, suggesting that increased model size, instruction fine-tuning, multimodal capabilities, and word comprehension - or interplay among all of these - contribute to its ability to model mental states. https://t.co/k1B0veyiog https://t.co/SR8JOzFBDo
Testing theory of mind in large language models and humans GPT-4 models performed at, or even sometimes above, human levels at identifying indirect requests, false beliefs and misdirection, but struggled with detecting faux pas. https://t.co/djwvSQ2XoX
LLMs achieve adult human performance on higher-order theory of mind tasks This paper examines the extent to which large language models (LLMs) have developed higher-order theory of mind (ToM); the human ability to reason about multiple mental and emotional states in a https://t.co/a8vtpsEjHH