Claude Opus 4.7 — Same Sticker Price, Higher Real Cost for Teams
Anthropic's Opus 4.7 keeps the same per-token price but a new tokenizer and silently raised effort defaults mean your Claude Code bill jumped the day you upgraded.
Anthropic released Claude Opus 4.7 on April 16, 2026. SWE-bench Verified jumped from 80.8% to 87.6%. Pricing stayed at $5/$25 per million tokens. And Claude Code silently switched every user’s default effort level from high to xhigh. Combined with a new tokenizer that maps identical input to up to 1.35× more tokens, the real cost of running Opus in production went up without a single line item changing on your invoice.
TL;DR
- What: Opus 4.7 ships real benchmark gains — but a new tokenizer (up to 1.35× more tokens on the same input) and a silently raised default effort level inflate your bill
- Breaking: Three hard API changes will crash existing integrations —
thinking.budget_tokensremoved,temperature/top_p/top_kremoved, thinking traces hidden by default - Cost lever: Task budgets (public beta) are the mitigation — activate via
task-budgets-2026-03-13header - Action: Check your token usage for the past week. If you didn’t pin effort to
high, you’re already running hotter.
What Happened
The headline story is capability. Opus 4.7 posts 87.6% on SWE-bench Verified (up from 80.8% on Opus 4.6) and 64.3% on SWE-bench Pro (up from 53.4%). These aren’t self-reported numbers sitting in a vacuum — Hex confirmed a 13% resolution lift on their internal 93-task coding benchmark, and CodeRabbit measured over 10% recall improvement on complex pull requests while precision held steady. The model genuinely got better at hard software engineering tasks.
But capability gains don’t exist in isolation. Three changes shipped alongside the intelligence bump that directly affect what you pay.
First, the tokenizer. Opus 4.7 uses a new tokenizer that produces roughly 1× to 1.35× as many tokens for the same text input. Code and technical English sit near the low end of that range. Multilingual content, heavily structured documents, and mixed-format inputs push toward the ceiling. Anthropic published a migration guide and recommends re-benchmarking before moving production traffic — a recommendation that only makes sense if they expect measurable cost differences.
Second, the effort default. Claude Code now defaults to xhigh effort on every plan — Pro, Max, Teams, Enterprise. If you hadn’t manually overridden effort in your config before April 16, you were silently upgraded. xhigh sits between the old high and max tiers and burns more thinking tokens per session. To revert: run /effort high in Claude Code or set export CLAUDE_CODE_EFFORT_LEVEL=high in your environment.
Third, three hard-breaking API changes. thinking.budget_tokens — the parameter that gave you per-request control over reasoning spend — now returns a 400 error. Gone. temperature, top_p, and top_k set to any non-default value also return 400 errors. And thinking.display defaults to omitted, meaning you no longer see the model’s reasoning trace unless you explicitly request it. Any production harness relying on these parameters broke the moment it hit Opus 4.7. Full details are in Anthropic’s What’s New in Claude 4.7 docs.
If you have production integrations using
thinking.budget_tokens,temperature,top_p, ortop_kwith Claude — they will return 400 errors on Opus 4.7. Test before migrating. On April 23, Anthropic switches Enterprise pay-as-you-go and API defaults to Opus 4.7 automatically.
Why This Matters
The sticker price didn’t change. That’s the part Anthropic leads with, and technically it’s true — $5 per million input tokens, $25 per million output tokens, same as Opus 4.6. But pricing is price-per-token, and if the same prompt now produces 15–35% more tokens through the tokenizer alone, your effective cost-per-request went up by 15–35%. Layer the xhigh effort default on top — more thinking tokens per session, silently enabled — and you’re looking at a compound increase that nobody agreed to.
I’ve seen this pattern before. Model gets smarter, model uses more tokens by design, heavier usage becomes the new default in the tool most developers interact with daily, and the whole thing gets framed as a quality improvement. It is a quality improvement. The SWE-bench numbers are real. The partner validations are real. But it is also a cost increase, and the fact that Anthropic doesn’t call it one doesn’t change your invoice.
The removal of thinking.budget_tokens makes this sharper. On Opus 4.6, you could set a hard ceiling on reasoning spend per request. That granular control is gone. The replacement — task budgets — operates at a different abstraction level. Task budgets give the model an advisory token ceiling across an entire agentic loop (thinking + tool calls + tool results + output). The model sees a running countdown and wraps gracefully when it approaches the limit. Activate via the task-budgets-2026-03-13 beta header. It’s a soft cap, not a hard one, but it’s the right primitive for teams running open-ended agentic sessions at volume.
Task budgets are currently in public beta. If you’re running Claude in agentic loops — autonomous coding sessions, multi-step tool chains, anything where token spend is unpredictable — activate them now. They replace the per-request precision of
thinking.budget_tokenswith session-level cost visibility. Not perfect, but it’s what you have.
There’s also a regression worth flagging. BrowseComp — the benchmark measuring deep web research capability — dropped from 83.7% on Opus 4.6 to 79.3% on Opus 4.7. That trails GPT-5.4 Pro (89.3%) and Gemini 3.1 Pro (85.9%) by a meaningful margin. If your agents rely on web research as a core capability, Opus 4.7 is a step backward for that specific workload. Stay on Opus 4.6 or route research tasks to a different model.
On the feature side, Opus 4.7 ships with /ultrareview — a dedicated Claude Code command that runs a focused code review session flagging architecture, security, and logic issues. Pro and Max users get three free ultrareviews per billing cycle. Separately, Anthropic launched Routines in research preview on April 14 — two days before the Opus 4.7 release and announced independently — as persistent automated workflows triggered by schedule, GitHub events, or API calls. Routines represent Anthropic’s first meaningful step toward Claude as a background agent rather than a request-response tool, though they are a separate initiative from the 4.7 model upgrade itself.
The Take
Anthropic shipped a genuinely better model and made you pay more for it without changing the price tag. That’s not deception — the tokenizer change is documented, the effort default is configurable, the migration guide exists. But the defaults are set to maximize token consumption, not minimize your bill. The model is smarter at xhigh than at high. Anthropic knows most developers won’t change the default. The result is a fleet-wide cost increase that looks like a free upgrade.
If you’re running Claude Code at volume and didn’t notice your bill climbing the week of April 16, check your token usage now. You may already be running 20–35% hotter than you were on Opus 4.6 at high effort. Pin your effort level explicitly. Activate task budgets. Re-benchmark your tokenizer costs before migrating production traffic. And if BrowseComp matters to your workload, don’t upgrade that pipeline yet.
The benchmarks earned the upgrade. The defaults earned your scrutiny.