[release] 5 min · Mar 20, 2026

Composer 2 — Cursor's Quiet Bet on an Undisclosed Chinese Model

Cursor's Composer 2 cuts prices 86% and beats Opus 4.6 on Terminal-Bench — but its Kimi K2.5 base was undisclosed at launch and the model only runs inside Cursor...

Cursor Composer 2 ↗ Mar 19, 2026
#cursor#ai-coding#proprietary-models#kimi-k2

Cursor released Composer 2 on March 19, 2026 — the third generation of its proprietary coding model in five months. The headline numbers are strong: 86% cheaper than Composer 1.5, and a Terminal-Bench 2.0 score that beats Claude Opus 4.6. What Cursor did not put in the launch post was the foundation: Kimi K2.5, an open-weight model from Chinese AI lab Moonshot AI. That detail surfaced a day later when a user found the model ID in API request headers.

TL;DR

  • What: Cursor shipped Composer 2, an 86% cheaper coding model that beats Opus 4.6 on Terminal-Bench 2.0
  • Hidden base: Built on Moonshot AI’s Kimi K2.5 — not disclosed at launch, found via API headers the next day
  • Lock-in: No standalone API. Composer 2 runs only inside Cursor’s IDE, trained for its specific agent tool stack
  • Action: The cost-performance story is real. The transparency story needs watching.

Composer 2 — What Happened

Three model generations in five months tells you how Cursor thinks about its model layer. Composer 1 shipped in October 2025. Composer 1.5 followed in February 2026 at $3.50 per million input tokens and $17.50 per million output tokens. Composer 2 drops those prices to $0.50 and $2.50 respectively — the 86% reduction that anchors the launch narrative.

There is also a “Fast” variant at $1.50/$7.50, which is still 57% cheaper than Composer 1.5. Both variants are available inside Cursor’s IDE. Neither is available anywhere else.

The benchmark picture is more nuanced than the headline suggests. On Terminal-Bench 2.0, Composer 2 scores 61.7. That beats Opus 4.6’s 58.0 but trails GPT-5.4’s 75.1 — a gap of 13.4 points that the launch post does not emphasize. On SWE-bench Multilingual, Composer 2 posts 73.7, which is competitive but behind leading frontier models. And then there is CursorBench, where it scores 61.3 — a benchmark that Cursor designed, runs, and reports on without independent verification.

Cursor used different evaluation harnesses for competing models: Claude Code harness for Anthropic models, Simple Codex harness for OpenAI models, and the official Harbor evaluation framework for Composer 2. For Terminal-Bench 2.0, Cursor took “the max score between the official leaderboard score and the score recorded running in our infrastructure” for non-Composer models. These methodology choices make direct comparisons less straightforward than the benchmark table implies. See Cursor’s technical report for full evaluation details.

The architecture story is more interesting than the benchmarks. Co-founder Aman Sanger stated that roughly 75% of total compute came from Cursor’s own training runs — continued pretraining followed by a 4x scale-up in reinforcement learning — not from the Kimi K2.5 base model itself. Fireworks AI handled both the RL training infrastructure and inference. Moonshot AI confirmed an authorized commercial partnership.

So Cursor took an open-weight model, poured substantial compute into fine-tuning it for its own tool stack (semantic search, file edits, terminal operations), and shipped it as a proprietary product. That is a legitimate engineering effort. The question is why they did not say so upfront.

Why This Matters

The non-disclosure is the story here, not the benchmarks. Both Lee Robinson and Aman Sanger acknowledged the omission was “a miss” and committed to future transparency. That is the right response, but it raises a question enterprise teams should take seriously: if you are building workflows around a model, do you know what model you are actually running?

This matters in three concrete ways.

License compliance. Large organizations audit their AI supply chains. Knowing that your coding model is built on an open-weight Chinese base model is not inherently a problem — Kimi K2.5’s license permits commercial use — but not knowing it at all means your compliance team cannot do their job. Cursor’s commitment to future transparency needs to be verified with Composer 3.

Benchmark trust. CursorBench is proprietary and self-reported. The different evaluation harnesses used for competing models mean the comparison table is not an apples-to-apples measurement. This does not make Composer 2 bad — Terminal-Bench 2.0 is an external benchmark where it genuinely performs well — but it means you should weight the CursorBench number at approximately zero when making decisions.

Vendor dependency. This is the structural point that outlasts the disclosure controversy. Composer 2 has no standalone API. It is trained specifically for Cursor’s agent tool stack. You cannot call this model outside Cursor. If you build team workflows around Composer 2’s pricing and performance, you are pricing in IDE lock-in. That is a fundamentally different proposition from using Claude or GPT-5.4 through their respective APIs, where you retain the ability to switch providers.

Compare this to how GitHub Copilot surfaces GPT-5.4: transparently, with the model name visible, and with the underlying model accessible through other channels. Cursor is doing the opposite — building a model that only works inside its walls and initially not telling you what it is built on.

If your team is evaluating Composer 2 for production use, run your own evals on your actual codebase. The benchmark numbers tell you how Composer 2 performs on benchmark tasks. They do not tell you how it performs on your React monorepo or your Go microservices.

The financial context explains Cursor’s strategy. With a $2 billion annualized revenue run rate as of February 2026, over 1 million daily active users, 50,000 business customers, and a $29.3 billion valuation, Cursor is not a scrappy startup experimenting with models. It is a company building a moat. Three Composer generations in five months signals that the model layer is the moat — not the editor chrome, not the extension ecosystem, but the proprietary model that only runs in their IDE.

This is a deliberate strategy to reduce dependency on Anthropic and OpenAI. By fine-tuning an open-weight base model through a third-party training provider, Cursor controls its inference costs, its training roadmap, and its margin structure. The 86% price drop is not generosity — it is what happens when you stop paying frontier model API prices and start running your own fine-tune.

The Take

I think the cost-performance argument for Composer 2 is genuine. An 86% price reduction across one model generation, with competitive benchmark performance, is a material improvement for teams already inside Cursor. The Terminal-Bench 2.0 score beating Opus 4.6 is real — even accounting for harness differences, that is a meaningful data point.

But the story here is not really about benchmarks. It is about strategic escape velocity. Cursor is building a vertically integrated AI coding company where the model, the IDE, and the workflow are inseparable. Every Composer generation tightens that integration. Every pricing cut makes switching more expensive in relative terms — not because Cursor charges more, but because the gap between “Cursor’s price” and “frontier API price” widens.

The non-disclosure of the Kimi K2.5 base was exactly what the founders called it: a miss. But the deeper issue is structural. You are now evaluating a proprietary fine-tune you cannot run anywhere else, partially benchmarked on a closed eval suite the vendor controls. That is not a reason to avoid Cursor — the product is fast, the pricing is aggressive, and the editor’s AI features remain tightly integrated with low completion latency. But it is a reason to understand exactly what you are buying into. Trust but verify. And right now, the verify part requires more effort than it should.