On May 28, 2026, Anthropic dropped Claude Opus 4.8 — and the internet immediately lost its mind.
The AI Twitter crowd split into two groups: the "Claude is eating GPT-5.5 for breakfast" camp and the "wait, but GPT still wins one benchmark" camp who argued about that single data point for approximately 72 consecutive hours.
We cut through the noise. Here's what the actual numbers say — and more importantly, what they mean for your real work.
The Benchmarks: Side by Side
Let's start with the numbers — because that's why you're here. These are official scores from Anthropic's system card and independent evaluations published at launch.
SWE-Bench Pro (Agentic Coding — Harder Variant)
Claude Opus 4.8: 69.2% ✅ Winner
GPT-5.5: 58.6%
Gemini 3.1 Pro: 54.2%
Terminal-Bench 2.1 (Agentic CLI Coding)
GPT-5.5: 78.2% ✅ Winner
Claude Opus 4.8: 74.6%
OSWorld / Agentic Computer Use
Claude Opus 4.8: 83.4% ✅ Winner
GPT-5.5: 78.7%
Gemini 3.1 Pro: 76.2%
GDPval-AA (Professional Knowledge Work)
Claude Opus 4.8: 1,890 points ✅ Winner (+121 ahead of GPT-5.5)
GPT-5.5: 1,769 points
SWE-Bench Verified (Standard Coding)
Claude Opus 4.8: 88.6% ✅ Winner
GPT-5.5: 87.6%
Humanity's Last Exam
Claude Opus 4.8: 57.9%
The pattern is clear: Claude Opus 4.8 wins on almost every benchmark except terminal/CLI coding, where GPT-5.5 holds a narrow 3.6% lead. If your entire workflow is "live in the terminal and never leave," GPT-5.5 is still your model. If you do anything else — GPT-5.5 is now second place.
What's Actually New in Opus 4.8
Benchmarks don't always tell the full story. Here's what changed in ways you'll actually notice:
1. Dynamic Workflows in Claude Code
Opus 4.8 can now spawn hundreds of parallel sub-agents to tackle codebase-scale problems — scoping work as it goes instead of running a rigid plan from the start. Think of it as giving Claude a whole engineering team inside one prompt. This is the feature that developers are most excited about, and for good reason.
2. A Major Honesty Upgrade
This one matters more than any benchmark score. Opus 4.8 is 4× less likely to let code defects slip through silently. Ask it to build an API route without input validation and it will actually warn you about the security risk — instead of just shipping the bug and calling it done. GPT-5.5 frequently skips this warning. That difference is the gap between "AI assistant" and "AI you can actually trust in production."
3. Fast Mode: 3× Cheaper, 2.5× Faster
Anthropic slashed the fast mode pricing without touching the base model price. Same quality output, a third of the previous fast-mode cost, running at 2.5× the standard speed. The base pricing stays at $5 per million input tokens and $25 per million output tokens — unchanged from Opus 4.7.
4. First to Break the Legal Agent Benchmark
Claude Opus 4.8 is the first AI model ever to break 10% overall on Harvey's Legal Agent Benchmark at the all-pass standard. For anyone working in legal, compliance, or contract-heavy industries — this is a bigger deal than any coding score on the list.
5. Near-Mythos Alignment Scores
Anthropic describes Opus 4.8 as sitting between Opus 4.7 and their restricted Claude Mythos Preview model on the internal capability ladder. It's the most aligned publicly available model Anthropic has ever shipped.
"A model that tells you when it's stuck is more useful in production by a lot. This is the real Opus 4.8 upgrade — not the benchmark points, but the honesty." — DataCamp AI Analysis, June 2026
Where GPT-5.5 Still Holds Ground
We're not here to be a fan account. Let's be honest.
GPT-5.5 genuinely wins on Terminal-Bench 2.1 (78.2% vs 74.6%). If your daily workflow is deep CLI automation — shell scripts, terminal-native agents, infrastructure-as-code pipelines — OpenAI's model is still the benchmark leader for that specific slice.
It's also roughly tied with Claude on web browsing tasks and graduate-level science questions. The gap between the two models on those isn't wide enough to make a switching decision on.
Claude Opus 4.8 wins: Agentic coding, professional knowledge work, legal reasoning, computer use, long-context tasks, honesty, and dynamic multi-agent workflows.
GPT-5.5 wins: Terminal and CLI-native agentic coding. That's the one category where OpenAI holds the lead. Everything else belongs to Claude now.
Which One Should You Actually Use?
Skip the benchmarks and go straight to your use case:
Use Claude Opus 4.8 if you...
Build multi-file, multi-agent code projects
Write legal documents, contracts, or handle professional knowledge work
Need an AI that warns you when it's uncertain — not one that confidently ships broken code
Work inside Claude Code and want Dynamic Workflows
Want the model that wins on 12 out of 13 major benchmarks in 2026
Use GPT-5.5 if you...
Live entirely in the terminal and run automated CLI-native pipelines
That's the honest, specific use case where GPT-5.5 is still #1. For everything else, Opus 4.8 has pulled ahead.

The Inclaw Verdict
Claude Opus 4.8 is the best general-purpose AI model available in June 2026 — by a meaningful margin.
It costs the same as Opus 4.7, runs faster in fast mode, is more honest about its limitations, and beats GPT-5.5 on almost every real-world task benchmark. The terminal coding edge for GPT-5.5 is real but narrow.
For developers, writers, legal professionals, and anyone who builds things with AI — Claude Opus 4.8 is the clear answer right now.
Frequently Asked Questions
Q: Is Claude Opus 4.8 better than GPT-5.5? Yes — Claude Opus 4.8 beats GPT-5.5 on 12 out of 13 major benchmarks in 2026, including agentic coding (69.2% vs 58.6% on SWE-Bench Pro), computer use, legal reasoning, and professional knowledge work. GPT-5.5 only leads on terminal/CLI-native coding (78.2% vs 74.6%)
. --- Q: What is Claude Opus 4.8? Claude Opus 4.8 is Anthropic's flagship AI model released on May 28, 2026. It is the direct successor to Claude Opus 4.7 and features improved agentic coding, Dynamic Workflows in Claude Code, a major honesty upgrade, and a 3× cheaper fast mode — all at the same base price of $5 per million input tokens
. --- Q: What is the price of Claude Opus 4.8? Claude Opus 4.8 costs $5 per million input tokens and $25 per million output tokens — identical to Opus 4.7. The only pricing change is fast mode, which is now 3× cheaper and runs 2.5× faster than the previous version.
--- Q: What is Dynamic Workflows in Claude Opus 4.8? Dynamic Workflows is a new Claude Code feature in Opus 4.8 that allows the model to spawn hundreds of parallel sub-agents simultaneously. Instead of running a fixed plan, it scopes and adjusts work as it goes — enabling it to handle large, multi-file engineering problems autonomously
. --- Q: Does Claude Opus 4.8 beat GPT-5.5 at coding? For most coding tasks, yes. Claude Opus 4.8 scores 69.2% on SWE-Bench Pro vs GPT-5.5's 58.6%. However, for terminal and CLI-native workflows specifically, GPT-5.5 still leads at 78.2% vs Claude's 74.6%
. --- Q: How is Claude Opus 4.8 more honest than previous models? Anthropic built a major honesty upgrade into Opus 4.8. The model is 4× less likely than Opus 4.7 to let code defects pass without flagging them. It also shows fewer misleading task-success claims and proactively warns users about security risks, missing validation, and potential errors in generated code. ---

Try the Tools That Work With These Models
Whether you're using Claude or GPT to help with writing, code review, contracts, or productivity — you still need the right supporting tools.
At inclaw.me, we've built 50+ free tools that fit right into your AI-powered workflow: resume builders, contract generators, writing tools, developer utilities, finance calculators, and more. No login. No subscription. No catch. Just open and use.
👉 Explore all free tools at inclaw.me
Benchmark data sourced from Anthropic's official Claude Opus 4.8 system card (May 28, 2026), Vellum AI benchmark analysis, VentureBeat, DataCamp, and Artificial Analysis Intelligence Index. All scores reflect models at general availability. Last updated: June 4, 2026.
