Claude Opus 4.8: Features, Benchmarks, Claude Code

Anthropic shipped Claude Opus 4.8 on May 28, 2026, six weeks after Opus 4.7 and at the same per-token price. The headline is honesty as a feature: Anthropic says Opus 4.8 is four times less likely than Opus 4.7 to let a code flaw pass without flagging it, and the benchmark gains line up with that framing. SWE-Bench Pro climbs from 64.3% to 69.2%, OSWorld-Verified inches up to 83.4%, and a new “dynamic workflows” mode in Claude Code lets the model fan a hard problem out across hundreds of parallel subagents and verify their work before reporting back.

Original content from computingforgeeks.com - post 168206

This guide covers what changed in Claude Opus 4.8, the benchmark numbers that matter, how to switch to claude-opus-4-8 from inside Claude Code, the new default effort level, the cheaper Fast mode, and when Opus 4.8 is worth its price tag versus Sonnet or Haiku.

What is new in Claude Opus 4.8

Opus 4.8 is a sharpening release more than a feature dump. Anthropic’s framing is that the model exercises better judgment on long-horizon coding work, catches its own mistakes during the run instead of declaring victory and leaving the bug for review, and is more willing to tell you when it is uncertain.

SWE-Bench Pro: 69.2% (Opus 4.7: 64.3%). Real issue resolution on a harder, less contaminated set than SWE-Bench Verified.
Honesty: 4x improvement on Anthropic’s internal measure of leaving code flaws unremarked. The release blog reframes this as “telling you when it is unsure and catching its own bugs instead of declaring victory early.”
Default effort drops to high. Opus 4.7’s default was xhigh; 4.8 at high spends about the same tokens as 4.7’s default on coding but scores higher.
Same pricing as Opus 4.7: $5 per million input tokens, $25 per million output. No price bump for a frontier model is unusual and is the single biggest reason to upgrade today.
Fast mode is 3x cheaper than the previous generation while running roughly 2.5x faster than the standard Opus 4.8 endpoint. Toggle with /fast inside Claude Code.
Dynamic workflows (research preview) let Claude Code plan a task, dispatch hundreds of parallel subagents that approach the problem from independent angles, refute each other’s findings, and iterate until the answers converge before reporting back.
OSWorld-Verified: 83.4% on the agentic computer-use benchmark, leading the comparison set.
GDPval-AA: 1890 on the knowledge-work eval, a clean lead over GPT-5.5 (1769) and a wide gap over Gemini 3.1 Pro (1314).
Claude Code rate limits raised to cover the extra tokens that xhigh and dynamic workflows consume on hard problems.
API model ID is claude-opus-4-8. The opus alias now routes to it.

The release cycle matters here. Opus 4.7 shipped on April 16; 4.8 follows roughly six weeks later, which is faster than Anthropic’s usual cadence and tracks public feedback that Opus 4.7 ran more cautious than 4.6 on some workloads. Anthropic also confirmed that Mythos, the larger model class still held back over safeguard work, is expected in the coming weeks for general availability.

Opus 4.8 benchmarks vs Opus 4.7, GPT-5.5, and Gemini 3.1 Pro

Anthropic’s own comparison spans agentic coding, terminal coding, computer use, knowledge work, financial analysis, and multidisciplinary reasoning. Opus 4.8 wins six of seven cells against the same set of competitors; GPT-5.5 still leads on Terminal-Bench 2.1.

Source: Anthropic, Introducing Claude Opus 4.8

The same data as a table you can cite without the image:

Benchmark	Opus 4.8	Opus 4.7	GPT-5.5	Gemini 3.1 Pro
Agentic coding (SWE-Bench Pro)	69.2%	64.3%	58.6%	54.2%
Agentic terminal coding (Terminal-Bench 2.1)	74.6%	66.1%	78.2%	70.3%
Humanity’s Last Exam, no tools	49.8%	46.9%	41.4%	44.4%
Humanity’s Last Exam, with tools	57.9%	54.7%	52.2%	51.4%
Agentic computer use (OSWorld-Verified)	83.4%	82.8%	78.7%	76.2%
Knowledge work (GDPval-AA)	1890	1753	1769	1314
Agentic financial analysis (Finance Agent v2)	53.9%	51.5%	51.8%	43.0%

The +4.9 point lift on SWE-Bench Pro is the headline number for working engineers, but the honesty improvement is the one users notice during a single session. Bridgewater Associates, an early customer, told Anthropic that Opus 4.8 proactively flags issues with the inputs and outputs of an analysis, something other models routinely miss. The behavior change shows up as fewer “implemented and tested” messages that turn out to be wrong at PR review.

Pricing, model ID, and availability

Pricing carries over from Opus 4.7. Same input rate, same output rate, same 1M-token context window at standard rates. Fast mode is the only price-side change, and it is a drop, not a raise.

Item	Claude Opus 4.8
API model ID	`claude-opus-4-8`
Standard input pricing	$5 / million tokens
Standard output pricing	$25 / million tokens
Fast mode speed	~2.5x faster than the standard endpoint
Fast mode cost	3x cheaper than the previous Fast tier
Context window	1,000,000 tokens at standard rates
Effort levels	`low`, `medium`, `high` (default), `xhigh`, `max`
Platforms	Claude.ai, Anthropic API, AWS Bedrock, Google Vertex AI, Microsoft Foundry, GitHub Copilot

Fast mode access on the API is still gated. The Claude Code /fast toggle works for Pro, Max, Teams, and Enterprise plans. For direct API access at the Fast tier, request through your account manager or join the waitlist at claude.com/fast-mode.

Switch Claude Code to Opus 4.8 on macOS

Opus 4.8 is available in Claude Code on the 2.1.x line. If you ran claude update any time in the last hour the model picker should already list it; the slash command /model is the safest way to confirm because it shows whatever the server is currently serving you.

claude update
claude --version

The version output should be on the 2.1.x line. Anything older will miss the /fast toggle and the dynamic workflows entry points. To start a session on Opus 4.8 from the command line without entering the picker, pass both flags:

claude --model claude-opus-4-8 --effort xhigh

Inside an existing session, set the model and the effort with slash commands. As of Claude Code 2.1.153, /model remembers your choice as the new default for future sessions; press s in the picker if you want the switch to apply only to the current session:

/model opus
/effort xhigh

The screenshot below is the verification flow on macOS: version check, confirming the xhigh effort knob is exposed in --help, then a one-shot --print run on claude-opus-4-8 that returns its own model name. If your output matches, you are on Opus 4.8.

For API users, the opus alias now routes to claude-opus-4-8. Pin the full model ID in your SDK call if you need reproducibility across the next Opus release; pass thinking: { type: "extended", budget_tokens: <n> } or leave it on adaptive mode depending on your SDK.

The new default effort and when to use xhigh

Opus 4.7 raised the Claude Code default effort to xhigh on release day. Opus 4.8 walks that back to high. Anthropic’s own framing is that high on 4.8 burns roughly the same coding-task tokens that xhigh on 4.7 did, while scoring higher across the board. In practice, the lower default keeps everyday refactors cheap while leaving xhigh available for the cases where it actually helps.

Effort	Use for
`low`	Tight follow-ups, chat, one-line edits
`medium`	Routine refactors, single-file changes, test generation
`high` (default)	Multi-file edits, debugging, writing a non-trivial function
`xhigh`	Hard problems, async long-running sessions, anything dispatched as a background agent
`max`	The hardest problem of the week. Reserved because it is slow and expensive

The rule of thumb from Anthropic’s release post: xhigh for hard problems and long-running async work, high for everything else. Claude Code rate limits were raised to cover the extra tokens this drives, so the limit hit you remember from Opus 4.7’s first week is unlikely to repeat at the same threshold.

Fast mode: 2.5x speed, 3x cheaper

Fast mode is the same Opus 4.8 model running on dedicated speed-optimised serving. Anthropic measures it at roughly 2.5x the throughput of the standard endpoint, and it is now three times cheaper than the previous Fast tier. In Claude Code you toggle it with a slash command:

/fast on
/fast off

Fast mode pairs well with medium or high effort on agent loops that fire many short turns, code review sweeps, and any interactive session where the wait between turns matters more than maximum-depth reasoning. Leave Fast off and use xhigh when you genuinely want the model to think longer.

Dynamic workflows: hundreds of parallel subagents

Dynamic workflows is the other big Claude Code feature shipping alongside Opus 4.8, and it is the one that changes what kind of tasks you can hand off without supervision. Anthropic describes it as Claude planning the work, dispatching subagents that approach the problem from independent angles, having other agents try to refute what they found, then iterating until answers converge. The single coordinated answer comes back at the end.

Two ways to invoke it:

Ask explicitly. Tell Claude “create a workflow for this” and it will plan the fan-out itself before touching any files.
Enable ultracode through the effort menu. This sets effort to xhigh and lets Claude decide on its own when a task is big enough to justify the workflow machinery.

Anthropic’s pitch is codebase-scale migrations from kickoff to merge: framework upgrades, API deprecations, language ports across hundreds of thousands of lines. Other natural fits are large security audits, bug sweeps, and architecture reviews where you want adversarial verification rather than one agent’s first answer.

The catch is cost. Dynamic workflows can consume substantially more tokens than a normal Claude Code session, so the run asks you to confirm before kicking off. It is in research preview on the Claude Code CLI, desktop app, VS Code extension, and API. Read the dynamic workflows blog post for the full walkthrough.

When to use Opus 4.8 vs Sonnet or Haiku

Opus 4.8 is priced as the flagship and bills like one. The rule of thumb has not changed since Opus 4.7: pick by task complexity and how much you are willing to pay per minute of Claude time, not by reflex.

Use Opus 4.8 for multi-file refactors, design review, hard debugging, dynamic workflows, anything you are about to delegate as a background agent, and any task where wrong code costs more than a few extra dollars of tokens.
Use Sonnet for the bulk of routine coding, test generation, documentation, and short interactive turns. It is still the cost-per-task winner for most developers and was specifically called out as the better daily driver in our guide to reducing Claude Code token usage.
Use Haiku for tight follow-ups, chat-like exchanges, structured extraction, and the kind of high-volume short calls where Opus is overkill and Sonnet still feels expensive.

The Hacker News thread on the launch picked up on one practical wrinkle worth knowing: some teams ran Opus 4.7 noticeably more cautious than 4.6 and rolled back. Early reports on 4.8 from the same community suggest the honesty work has made the model less prone to arguing with users about whether a fix is needed, which is the most common 4.7 complaint. If you tuned your CLAUDE.md for 4.6’s looser behavior and walked it back for 4.7, you may want to re-test on 4.8 before keeping the workaround.

What about Mythos

Mythos is the larger model class above Opus 4.8 that Anthropic has been releasing only to a handful of partners while it builds out cybersecurity safeguards. The Opus 4.8 launch post and a parallel Reuters interview both pin Mythos for general availability “in the coming weeks.” Until then, Opus 4.8 is the most capable model you can put behind a Claude Code session today.

If you have been waiting for the next Anthropic launch to upgrade your Claude Code workflow, this is it. The honesty improvement is the kind of change that compounds across sessions, the SWE-Bench Pro jump is real, and same-price-as-4.7 means there is no migration math to do. Run claude update, type /model opus, and start your next task.