OpenAI, Anthropic Test Each Other’s AI Models

Published : Aug 28, 2025, 12:20 AM IST
https://stocktwits.com/news-articles/markets/equity/openai-anthropic-test-each-other-s-ai-models/chsUBVJRdlf

Synopsis

The evaluation looked at how models like Anthropic’s Claude Opus 4 and Claude Sonnet 4 fared compared to OpenAI’s GPT-4o, GPT-4.1, and smaller systems like o3 and o4-mini.

OpenAI and Anthropic have taken a rare collaborative step in AI safety by testing each other’s language models in a joint evaluation aimed at probing the risks of their respective technologies. 

In a blog post on Wednesday, the firms said the evaluation looked at how models like Anthropic’s Claude Opus 4 and Claude Sonnet 4 fared compared to OpenAI’s GPT-4o, GPT-4.1, and smaller systems like o3 and o4-mini. 

The joint effort aimed to spotlight model behavior under challenging safety scenarios, not to offer direct, head-to-head comparisons. OpenAI emphasized that the focus was on understanding general tendencies, rather than creating safety rankings. 

On Stocktwits, retail sentiment around OpenAI remained in ‘neutral’ territory amid ‘low’ message volume levels over the past day.

Anthropic’s Claude 4 series performed well in tests related to respecting hierarchical instructions and resisting prompt extraction. In contrast, these models underperformed in jailbreaking evaluations compared to OpenAI o3 and OpenAI o4-mini. Disabling reasoning in Claude models sometimes improved their performance in jailbreak tests.

When it came to hallucinations, where models generate inaccurate information, Claude models were highly cautious, often choosing not to respond at all. OpenAI’s models, including o3 and o4-mini, provided more responses but had higher hallucination rates, especially when restricted from using external tools like web browsing.

OpenAI’s own systems, particularly o3, showed strong performance in resisting manipulative prompts and avoiding scheming behaviors. OpenAI noted that these tests are intentionally difficult and don’t necessarily reflect real-world usage.

OpenAI stated it would keep evolving its testing methods. The company also recently launched GPT-5, which it claims shows improvements in reducing hallucinations, sycophancy, and misuse.

For updates and corrections, email newsroom[at]stocktwits[dot]com.<

PREV

Stay updated with all the latest Business News, including market trends, Share Market News, stock updates, taxation, IPOs, banking, finance, real estate, savings, and investments. Track daily Gold Price changes, updates on DA Hike, and the latest developments on the 8th Pay Commission. Get in-depth analysis, expert opinions, and real-time updates to make informed financial decisions. Download the Asianet News Official App from the Android Play Store and iPhone App Store to stay ahead in business.

Recommended Stories

Fed’s Goolsbee Explains Why He Voted Against December Rate Cut: 'Getting More Evidence First Feels Like The Wiser Choice'
Katapult Stock Jumps Over 30% After Merger Agreement With Aaron’s And CCF – Why This Combination Makes Sense