Google, DeepMind, Stanford Introduce Search-Augmented

"Long-Form Factuality in Large Language Models" introduces a new approach to evaluating and benchmarking the factuality of long-form responses generated by large language models (LLMs). Key contributions: https://t.co/61SPVtboDN

Marktechpost AI Research News ⚡@Marktechpost

3 mo

Researchers from Google DeepMind and Stanford Introduce Search-Augmented Factuality Evaluator (SAFE): Enhancing Factuality Evaluation in Large Language Models Quick read: https://t.co/anXisulDKY Researchers from Google DeepMind and Stanford University have introduced a novel…

FORTUNE@FortuneMagazine

3 mo

People and companies lie about AI. https://t.co/CTFindvjC4

Multiplatform.AI@MultiplatformAI

3 mo

DeepMind Unveils SAFE: An AI-Powered Tool for Fact-Checking LLMs #accuracy #AI #artificialintelligence #ChatGPT #Collaboration #DeepMind #Factcheckers #factchecking #GoogleSearch #llm #machinelearning #Media #methodology #opensource #Reliability #Safe https://t.co/RQ7hednLqH https://t.co/k6RPJa9UKr

Brian Roemmele@BrianRoemmele

3 mo

Google is working on a new “Fact Checking” AI. The Search-Augmented Factuality Evaluator (SAFE). SAFE uses a large language model to break down generated text into individual facts, and uses Google Search to determine the accuracy of each claim. Yep. https://t.co/jrn8JCcw2B

Nathan Hu@NathanHu12

3 mo

New work on evaluating long form factuality 🎉. Our method SAFE combines google search and LLM queries to extract and verify individual claims in responses. Most excitingly, we show SAFE is cheaper💰 and more reliable ✅ than human annotators. https://t.co/ulSad7fs0b

Ruibo Liu@RuiboLiu

3 mo

New factuality research! We use LMs as annotators & search engines for grounding to create a realistic benchmark for evaluating long-form factuality. Simulating your daily queries to LMs about knowledge & truth. 🔍📊 #NLProc #FactChecking Check this out! 👇 https://t.co/UydBu8ObvC

Chengrun Yang@chengrun_yang

3 mo

We focus on long-form factuality in open domain, and so we show an entire evaluation pipeline with dataset + autorater + metric. The dataset was generated with LLMs and the autorater is an LLM agent with Google Search, demonstrating LLMs can rate themselves better than humans! https://t.co/DKwXxmBdFg

Quoc Le@quocleix

3 mo

Our new work on evaluating and benchmarking long-form factuality. We provide a new dataset, an evaluation method, an aggregation metric that accounts for both precision and recall, and an analysis of thirteen popular LLMs (including Gemini, GPT, and Claude). We’re also… https://t.co/EHXmBY8LAE

Jerry Wei@JerryWeiAI

3 mo

New @GoogleDeepMind+@Stanford paper! 📜 How can we benchmark long-form factuality in language models? We show that LLMs can generate a large dataset and are better annotators than humans, and we use this to rank Gemini, GPT, Claude, and PaLM-2 models. https://t.co/A3vgEjbqTV https://t.co/x1tlgYlCdg

Ethan Mollick@emollick

3 mo

AIs have a bad reputation for truth, so three important findings in this paper: 1) "LLM agents can achieve superhuman rating performance" on fact checking when given access to Google! 2) Bigger models are more factual 3) LLMs are 20x cheaper than humans https://t.co/lSSMAjoOnF https://t.co/oAWuaZFNPA

fly51fly@fly51fly

3 mo

[CL] Long-form factuality in large language models J Wei, C Yang, X Song, Y Lu, N Hu, D Tran, D Peng, R Liu, D Huang, C Du, Q V. Le [Google DeepMind] (2024) https://t.co/VtkDsWHUgs - The paper introduces LongFact, a new prompt set for benchmarking long-form factuality of… https://t.co/Ezd8dZWHW7

Rohan Paul@rohanpaul_ai

3 mo

Great paper from Google-deepmind- "LONG - FORM FACTUALITY IN LARGE LANGUAGE MODELS" 📌 Propose that LLM agents can be used as automated evaluators for long-form factuality through a method which we call Search-Augmented Factuality Evaluator (SAFE). 📌 Demonstrate that LLM… https://t.co/TOhm3tCxrG

AK@_akhaliq

3 mo

Google announces Long-form factuality in large language models Large language models (LLMs) often generate content that contains factual errors when responding to fact-seeking prompts on open-ended topics. To benchmark a model's long-form factuality in open domains, we first https://t.co/SkcoK8qJaQ

Aran Komatsuzaki@arankomatsuzaki

3 mo

Google presents Long-form factuality in large language models - Proposes that LLM agents can be used as automated evaluators for longform factuality - Shows that LLM agents can achieve superhuman rating performance repo: https://t.co/rlAIFSqfTU abs: https://t.co/L3CpeLaFpQ https://t.co/OSNpnr1BmP

Rick Rubin@RickRubin

3 mo

“We assume that there are these smart people making AI, and therefore it's right. But the reality is, there are going to be some really big errors. And unless we have ethics and verification helping us get through that layer of error, it’s going to create so much chaos in the…

Similar Stories

Google, DeepMind, Stanford Introduce Search-Augmented Factuality Evaluator for Large Language Models

Similar Stories

Sources

Google, DeepMind, Stanford Introduce Search-Augmented Factuality Evaluator for Large Language Models