Google DeepMind and Stanford Launch SAFE to Enhance LL

Marktechpost AI Research News ⚡@Marktechpost

Researchers from Google DeepMind and Stanford Introduce Search-Augmented Factuality Evaluator (SAFE): Enhancing Factuality Evaluation in Large Language Models Quick read: https://t.co/anXisulDKY Researchers from Google DeepMind and Stanford University have introduced a novel…

FORTUNE@FortuneMagazine

3 mo

People and companies lie about AI. https://t.co/CTFindvjC4

Multiplatform.AI@MultiplatformAI

3 mo

DeepMind Unveils SAFE: An AI-Powered Tool for Fact-Checking LLMs #accuracy #AI #artificialintelligence #ChatGPT #Collaboration #DeepMind #Factcheckers #factchecking #GoogleSearch #llm #machinelearning #Media #methodology #opensource #Reliability #Safe https://t.co/RQ7hednLqH https://t.co/k6RPJa9UKr

Yifeng Lu@yifenglou

3 mo

Our new efforts is trying to address an elephant in the room for LLM: Given factuality/hallucination is so critical to the success of LLM, is there a quantitive evaluation to benchmark all existing LLMs in general? Hope our benchmark would be adopted and benchmarked as part of… https://t.co/NfkqTGRAoh

Nathan Hu@NathanHu12

3 mo

New work on evaluating long form factuality 🎉. Our method SAFE combines google search and LLM queries to extract and verify individual claims in responses. Most excitingly, we show SAFE is cheaper💰 and more reliable ✅ than human annotators. https://t.co/ulSad7fs0b

Ruibo Liu@RuiboLiu

3 mo

New factuality research! We use LMs as annotators & search engines for grounding to create a realistic benchmark for evaluating long-form factuality. Simulating your daily queries to LMs about knowledge & truth. 🔍📊 #NLProc #FactChecking Check this out! 👇 https://t.co/UydBu8ObvC

Chengrun Yang@chengrun_yang

3 mo

We focus on long-form factuality in open domain, and so we show an entire evaluation pipeline with dataset + autorater + metric. The dataset was generated with LLMs and the autorater is an LLM agent with Google Search, demonstrating LLMs can rate themselves better than humans! https://t.co/DKwXxmBdFg

Quoc Le@quocleix

3 mo

Our new work on evaluating and benchmarking long-form factuality. We provide a new dataset, an evaluation method, an aggregation metric that accounts for both precision and recall, and an analysis of thirteen popular LLMs (including Gemini, GPT, and Claude). We’re also… https://t.co/EHXmBY8LAE

Jerry Wei@JerryWeiAI

3 mo

New @GoogleDeepMind+@Stanford paper! 📜 How can we benchmark long-form factuality in language models? We show that LLMs can generate a large dataset and are better annotators than humans, and we use this to rank Gemini, GPT, Claude, and PaLM-2 models. https://t.co/A3vgEjbqTV https://t.co/x1tlgYlCdg

Ethan Mollick@emollick

3 mo

AIs have a bad reputation for truth, so three important findings in this paper: 1) "LLM agents can achieve superhuman rating performance" on fact checking when given access to Google! 2) Bigger models are more factual 3) LLMs are 20x cheaper than humans https://t.co/lSSMAjoOnF https://t.co/oAWuaZFNPA

fly51fly@fly51fly

3 mo

[CL] Long-form factuality in large language models J Wei, C Yang, X Song, Y Lu, N Hu, D Tran, D Peng, R Liu, D Huang, C Du, Q V. Le [Google DeepMind] (2024) https://t.co/VtkDsWHUgs - The paper introduces LongFact, a new prompt set for benchmarking long-form factuality of… https://t.co/Ezd8dZWHW7

Rohan Paul@rohanpaul_ai

3 mo

Great paper from Google-deepmind- "LONG - FORM FACTUALITY IN LARGE LANGUAGE MODELS" 📌 Propose that LLM agents can be used as automated evaluators for long-form factuality through a method which we call Search-Augmented Factuality Evaluator (SAFE). 📌 Demonstrate that LLM… https://t.co/TOhm3tCxrG

AK@_akhaliq

3 mo

Google announces Long-form factuality in large language models Large language models (LLMs) often generate content that contains factual errors when responding to fact-seeking prompts on open-ended topics. To benchmark a model's long-form factuality in open domains, we first https://t.co/SkcoK8qJaQ

Aran Komatsuzaki@arankomatsuzaki

3 mo

Google presents Long-form factuality in large language models - Proposes that LLM agents can be used as automated evaluators for longform factuality - Shows that LLM agents can achieve superhuman rating performance repo: https://t.co/rlAIFSqfTU abs: https://t.co/L3CpeLaFpQ https://t.co/OSNpnr1BmP

Similar Stories

Google DeepMind and Stanford Launch SAFE to Enhance LLM Factuality

Similar Stories

Sources

Google DeepMind and Stanford Launch SAFE to Enhance LLM Factuality