On Thursday, Anthropic introduced two additions to its Claude 4 lineup: Claude Opus 4 and Claude Sonnet 4. According to Anthropic, Opus 4 is now the “world’s best coding model,” excelling at sustained, long-horizon agentic workflows, while Sonnet 4 delivers stronger coding and reasoning than its predecessor, Sonnet 3.7.
Starting with Claude Opus 4: on the SWE-bench verified benchmark—which tests real software engineering tasks—it scores 72.5%, edging out OpenAI’s top coder, Codex-1, at 72.1%. When leveraging parallel test-time compute (similar to Gemini 2.5 Pro’s Deep Think mode), Opus 4’s accuracy jumps to an impressive 79.4%.
Interestingly, Claude Sonnet 4 tallies 72.7% on SWE-bench and reaches 80.2% with parallel compute, outperforming the larger Opus 4 on pure coding tasks. Anthropic positions Sonnet 4 as a balanced choice: it pairs high coding accuracy with efficiency and enhanced steerability for fine-tuned control. While Opus 4 shines in complex, extended workflows, Sonnet 4 offers an optimal blend of capability and practicality.
Both models employ a hybrid reasoning architecture, delivering near-instant replies and deeper “extended thinking” when needed. Opus 4 also supports file-based memory: for example, it generated a navigation guide while playing Pokémon, storing key details in a memory file.
On the safety front, Claude Opus 4 debuts with AI Safety Level 3 protections under Anthropic’s Responsible Scaling Policy, including Constitutional Classifiers and other countermeasures against jailbreaking.
These Claude 4 models are now rolling out to paid Pro, Max, Team, and Enterprise subscribers—and Claude Sonnet 4 is even available to free users, albeit without the extended thinking feature.