Claude Opus 4.5 Claims SWE-Bench Crown with 80.9% Score
Anthropic released Claude Opus 4.5 on November 24, establishing a new high-water mark for enterprise AI coding with an 80.9% score on SWE-bench Verified—surpassing both GPT-5.1-Codex-Max and Gemini 3 Pro. The model leads across 7 out of 8 programming languages with significant improvements in vision, reasoning, mathematics, and complex multi-step tasks.
On the Artificial Analysis Intelligence Index, Opus 4.5 scores 70 in reasoning mode—a +7 point jump from Claude Sonnet 4.5 (Thinking). This positions it as the second most intelligent model globally, tying with GPT-5.1 at 70, ahead of Grok 4 (65), trailing only Gemini 3 Pro (73).
The practical improvements are dramatic: 50% to 75% reductions in both tool calling errors and build/lint errors. Anthropic tested the model on a notoriously difficult performance engineering take-home exam—within their 2-hour time limit, Opus 4.5 scored higher than any human candidate ever. Pricing improved to $5/million input and $25/million output, down from $15/$75.
This is the model you're running on right now. The SWE-bench dominance directly impacts agentic coding workflows—the 50-75% error reduction means fewer iterations to complete complex tasks. For your agent architecture work at Emergence, Opus 4.5's improved prompt injection resistance (the best of any frontier model) is critical for production deployments. Consider benchmarking your current agent pipelines against Opus 4.5 to quantify the improvement.