Search

Search

Business Crypto Culture Environment Politics Science Sports Tech Video Games World

AI AR-VR Fintech Infosec IoT Metaverse Mobile Policy Robotics Smart Home Social Software Startups Wearables

Similar Stories

JUDGE-BENCH Study Finds 11 LLMs Not Ready to Replace Human Judges in NLP Tasks
Authors
6
2 days
AI
Tech
Science
NoCha Benchmark Shows LLMs, Including GPT-4o, Struggle with Reasoning, Best Scoring 55.8%
Authors
12
8 days
AI
Tech
Science
AI Community Debates LLMs' Potential for Superintelligence and Cognitive Interface with Humans, ARC-AGI Test Shows Accuracy Jump
Authors
5
11 days
AI
Tech
Enhancements in Large Language Models with AI Agents to Optimize Interface
Authors
4
9 days
AI
Tech
Microsoft and Meta Advance LLMs with Phi-3 and Llama 3 Amid Challenges
Authors
11
12 days
AI
Tech
Google, DeepMind Study Reveals Large Language Models' Hidden Information Boosts Performance
Authors
4
6 days
AI
Tech
Science

Sources

Loading...

Similar Stories

JUDGE-BENCH Study Finds 11 LLMs Not Ready to Replace Human Judges in NLP Tasks
Authors
6
2 days
AI
Tech
Science
NoCha Benchmark Shows LLMs, Including GPT-4o, Struggle with Reasoning, Best Scoring 55.8%
Authors
12
8 days
AI
Tech
Science
AI Community Debates LLMs' Potential for Superintelligence and Cognitive Interface with Humans, ARC-AGI Test Shows Accuracy Jump
Authors
5
11 days
AI
Tech
Enhancements in Large Language Models with AI Agents to Optimize Interface
Authors
4
9 days
AI
Tech
Microsoft and Meta Advance LLMs with Phi-3 and Llama 3 Amid Challenges
Authors
11
12 days
AI
Tech
Google, DeepMind Study Reveals Large Language Models' Hidden Information Boosts Performance
Authors
4
6 days
AI
Tech
Science

Footer

Business

Economics
Real Estate
VC

Crypto

Airdrops
Blockchains
CBDCs
DeFi
Hacks
Markets
Memecoin
Mining
NFT
Regulation

Culture

Celebrities
Crime
Education
Movies
Music
Obituary
TV

Environment

Climate
Energy
Natural Disasters
Natural Resources
Sustainability

Politics

Arizona
Boston
California
Chicago
Colorado
Detroit
Florida
Georgia
LA
Las Vegas
Los Angeles
New Jersey
New Mexico
New York
Ohio
Oregon
Philadelphia
San Francisco
Seattle
SF
Texas
Utah
Washington DC

Science

Bio
Health

Sports

Boxing
Chess
Cricket
Golf
Hockey
MLB
NBA
NCAA
NFL
Olympics
PGA
Poker
Racing
Rugby
Soccer
Tennis
UFC

Tech

AI
AR-VR
Fintech
Infosec
IoT
Metaverse
Mobile
Policy
Robotics
Smart Home
Social
Software
Startups
Wearables

Video Games

Esports
Releases

World

Africa
Asia
Australia
Brazil
Britain
Canada
China
Europe
France
Germany
Hong Kong
India
Israel
Italy
Japan
Latin America
Mexico
Middle East
North Korea
Pakistan
Poland
Russia
South America
Spain
Turkey
Ukraine
United States
US
USA

WhatsApp YouTube X

© 2024 DeepNFTValue, Inc. All rights reserved.

May 31, 07:28 PM

GPT-4 and Flan-PaLM Surpass Humans in 6th Order Theory of Mind Tasks

GPT-4 and Flan-PaLM Surpass Humans in 6th Order Theory of Mind Tasks

Authors

5

Recent research indicates that GPT-4 and Flan-PaLM, large language models (LLMs), have achieved adult-level and near adult-level performance on Theory of Mind (ToM) tasks. Notably, GPT-4 exceeds adult human performance on 6th order inferences. The study, published on May 29th on arXiv, involved 1,440 data points, though some data appeared noisy due to the small number of questions. The findings highlight the potential of LLMs in complex cognitive tasks, although the reliability of benchmarks for such tasks remains a topic of debate.

#Theory of Mind #arXiv

Written with ChatGPT (GPT-4).

Ethan Mollick@emollick
1 mo
LLMs “intelligence” is hard to benchmark, as we don’t have good benchmarks for human performance at complex tasks. Take theory-of-mind: several tests found GPT-4 beats humans, but another one finds a huge gap. Is it the testing structure? Prompting? Which is right? Hard to know. https://t.co/z9L3stRCDP
Hyunwoo Kim@hyunw__kim
1 mo
⁉️ Let's check how GPT-4o, Gemini, Llama3, Mixtral, and Claude perform on theory of mind, shall we?🌟We report new results on Benchmark FANToM👻 - GPT-4o tops the chart by finally achieving score of 2.0/100 (vs. Human 87.5) - Huge boost for Gemini-1.5-flash compared to… https://t.co/rtuNsfEyIF https://t.co/GZWidSdIYs
Burny — Effective Omni@burny_tech
1 mo
LLMs achieve adult human performance on higher-order theory of mind tasks GPT-4 and Flan-PaLM reach adult-level and near adult-level performance on ToM tasks overall, and that GPT-4 exceeds adult performance on 6th order inferences https://t.co/SsKbu4PbCo
Burny — Effective Omni@burny_tech
1 mo
GPT-4 now exceeds humans on theory of mind tasks GPT-4 and Flan-PaLM reach adult-level and near adult-level performance on ToM tasks overall, and that GPT-4 exceeds adult performance on 6th order inferences https://t.co/6aPVli2Cwp
Dan Elton@moreisdifferent
1 mo
May 29th arXiv📄: "We find that GPT-4 and Flan-PaLM reach adult-level ... on Theory of Mind tasks overall, and that GPT-4 exceeds adult performance on 6th order inferences." 6th order?🤯 BTW, N=1440, but data looks noisy, possibly due to small # of Qs.. https://t.co/do2oSFU8zj https://t.co/g7JG2vAt9M
Rohan Paul@rohanpaul_ai
1 mo
GPT-4 now exceeds humans at theory of mind tasks 🤯 Paper "LLMs achieve adult human performance on higher-order theory of mind tasks": 📌 The key finding is that GPT-4 and Flan-PaLM reach adult or near-adult level performance on Theory of Mind (ToM) tasks up to the 6th order,… https://t.co/U7yxCKEbNa
AI Notkilleveryoneism Memes ⏸️@AISafetyMemes
1 mo
GPT-4 just surpassed adult human performance at THEORY OF MIND tasks (basically, mind reading) And this is just GPT-4! Imagine GPT-5... and, if we're still here to see it, GPT-6. Soon, AIs will think we're as slow as plants. PLANTS. And… um… most people don't know this, but… https://t.co/zE0GOrCtsD https://t.co/QDkQ3ETBGt
Andrew Curran@AndrewCurran_
1 mo
GPT-4 demonstrated superior performance over humans in 6th Order ToM inferences, suggesting that increased model size, instruction fine-tuning, multimodal capabilities, and word comprehension - or interplay among all of these - contribute to its ability to model mental states. https://t.co/k1B0veyiog https://t.co/SR8JOzFBDo
Burny — Effective Omni@burny_tech
1 mo
Testing theory of mind in large language models and humans GPT-4 models performed at, or even sometimes above, human levels at identifying indirect requests, false beliefs and misdirection, but struggled with detecting faux pas. https://t.co/djwvSQ2XoX
AK@_akhaliq
1 mo
LLMs achieve adult human performance on higher-order theory of mind tasks This paper examines the extent to which large language models (LLMs) have developed higher-order theory of mind (ToM); the human ability to reason about multiple mental and emotional states in a https://t.co/a8vtpsEjHH

AI/Modeling AI/ChatGPT Features AI/New Products