Which LLM coding subscription is actually fast?

Real-world throughput for every model on every coding-agent subscription we could get our hands on. Same prompt, same client, every model the provider exposes. Updated weekly.

13 providers 95 models on subscriptions 246 successful runs 315,322 output tokens measured
7,329
Peak tok/s · DeepSeek deepseek-chat
122.9
Best tok/s per $/mo · CrofAI Hobby
$5
Cheapest available coding sub
7 / 13
Providers with availability caveats
If you're picking a sub today

Two picks by budget

The fastest available subscription tier in each price bracket — based on the peak tok/s we measured on each provider's plan. Buckets with no available subscription are skipped.

Budget tier≤ $20 / mo

CrofAI Hobby

CrofAI · kimi-k2.5-lightning

Aggregator that resells open-source models behind one cheap key. Cheapest available plan at $5/mo (Hobby); peak 614.4 tok/s on kimi-k2.5-lightning.

Peak
614.4 tok/s
Value
122.9 / $
Reliability
View benchmarks → BEST VALUE
Mid tier$20 – $50 / mo

Synthetic Subscription

Synthetic · hf:moonshotai/Kimi-K2.6

Coverage paused — Synthetic subscription not renewed for 2026-W22; historical data retained. Cheapest available plan at $30/mo (Subscription); peak 224.2 tok/s on hf:moonshotai/Kimi-K2.6.

Peak
224.2 tok/s
Value
7.5 / $
Reliability
View benchmarks →
Editor's pick · 2026-W22

Millaguie's pick of the week

CrofAI drops out of the pick — the $5 flat-rate volume play is ending. CrofAI announced (2026-05-31) it is reworking subscriptions: either credit-based plans worth 1.15× their PAYG value (a $5 Hobby sub becomes $5.75 of credit) or scrapping subs entirely to focus on rock-bottom pay-as-you-go. Either way the flat, unlimited-volume Hobby plan that made CrofAI our daily driver is going away — metered billing means running the full weekly bench against it stops being free. We're pulling it from the pick now and expect to drop it from the weekly tests once the new pricing lands (see notes). The pair that replaces it: Xiaomi for deep reasoning, MiniMax for heavy loads. mimo-v2.5 via Xiaomi direct (Token Plan Europe) for problems that need long reasoning — 0.919 quality, 75 tok/s, 1M context, and a price tag that just got slashed (see notes). MiniMax-M2.5 via the Lightning route for heavy workloads where 100+ tok/s and Anthropic-compat wire format make Claude Code integration trivial. Both are in the quality top-10 and both bill flat-rate — no runaway cost on long agentic sessions. If budget is not the constraint: claude-opus-4-8 is out of debate this week — #1 overall (0.973), #2 architect, tied #1 developer. But at $15/$75 per million tokens it's a one-shot tool for critical work, not something to iterate against.
▸ Previous picks
2026-W21opencode-go as the stopgap while CrofAI was down (curated catalogue, $5 first-month promo, quality holding up against the direct upstreams we'd been leaning on). DeepSeek as the second lane — PAYG at $0.14/$0.28, solid architecture scores, realistic ~60-75 tok/s once we measured TTFT honestly.
2026-W20CrofAI as the no-asterisks daily driver (flat plan, twenty models, best per-model latency on shared weights), Claude for the heavy one-shots, and OpenRouter's :free tier or MiniMax under load — baidu/cobuddy:free was the surprise at #1 in architect (0.989).
2026-W19CrofAI as the daily driver, MiniMax as the heavy-problem pair, DeepSeek pay-as-you-go for genuinely occasional use. The eighteen-model CrofAI catalogue plus Kimi K2 / GLM 5.x made it the most flexible single-vendor option on the board.
2026-W18DeepSeek V4-pro + MiniMax. DeepSeek ran a 75 % launch discount and MiniMax's pricing was already competitive — best cost-per-quality pairing on the board until the discount expired.
2026-W17DeepSeek + MiniMax for the win. Even on pay-as-you-go, DeepSeek's pricing is low enough to use it as a reasoning escape hatch when MiniMax gets stuck in a rabbit hole — without committing to a premium subscription.
Weekly notes · 2026-W22

Opus 4.8 takes the crown; Laguna-xs.2 free and open; goodbye Alibaba, hello Xiaomi

A loaded week of movements on the board.

claude-opus-4-8 ships and dominates. Anthropic released claude-opus-4-8 this week and it tops all three of our rankings: #1 overall with an aggregate score of 0.973 (vs 0.948 for opus-4-7), tied #1 developer at 0.969, #2 architect at 0.976. At $15/$75 per million tokens it remains a model for occasional use, not iteration, but the quality jump over 4-7 justifies it where the boundary used to be blurry.

poolside/laguna-xs.2:free is the surprise of the year, not just the week. Poolside's model (lab founded by Eiso Kant and Jason Warner) ties opus-4-7 in the overall top (0.948) and ties opus-4-8 itself in developer with 0.969. What makes it genuinely disruptive: it is open-weight under Apache 2.0 on Hugging Face (huggingface.co/poolside/Laguna-XS.2) and Poolside claims it runs on a single GPU — 33B total / 3B active (MoE), 128K context, SWA with per-head gating, fp8. A frontier-class coding model you can self-host on reasonable hardware. The :free route on OpenRouter is a launch promo, not permanent — use it while it lasts. It is the first time MSA has seen an open-weight model compete technically with proprietary frontier in its own league.

Xiaomi in, Synthetic out. Xiaomi MiMo debuts via its Token Plan Europe (4 SKUs benched: omni, pro, v2.5, v2.5-pro; the TTS / voice-clone variants are filtered out at the provider level). Solid results: mimo-v2-pro ties #1 in developer (0.969), mimo-v2.5 sits #4 in architect (0.939). On top of that, Xiaomi announced (2026-05-28, effective 2026-05-27) a permanent price cut across the V2.5 family: V2.5-Pro input drops from $1.00 to $0.435 per million, output from $3.00 to $0.87, cached input from ~$0.36 to $0.0036 — the ~99% headline cut. Context-length pricing tiers are gone. Token Plan balances were reset with credit equivalence 5-8× higher. At this price Xiaomi MiMo lands in DeepSeek territory and becomes a natural pick for long-reasoning workloads. Source: platform.xiaomimimo.com. Synthetic, meanwhile, leaves the board: we did not renew the subscription this cycle — the 33 catalogue SKUs all returned HTTP 402 "balance $0.00". The provider stays on the site with its history and a pause caveat; if anyone wants to sponsor an API key so we can keep covering it, we'll wire it back in (see the provider's card).

CrofAI is reworking subscriptions — likely off the bench soon. CrofAI posted (2026-05-31) that flat-rate subs are no longer sustainable. Two options are on the table: convert subs to credit-based plans worth 1.15× their PAYG value (a $5 Hobby tier becomes $5.75 of credit per month), or remove subscriptions entirely and push PAYG prices as low as possible. The change starts 2026-05-31 and should be done by 2026-06-01 to 06-03; currently purchased subs keep working until they end or renew before anything switches. Either way the flat, unlimited $5 Hobby plan that made CrofAI our volume daily driver disappears — under metered PAYG, running the full weekly suite against CrofAI stops being free. We've already removed CrofAI from this week's pick, and expect to drop it from the weekly bench once the new pay-per-use pricing goes live (likely 2026-W23). The provider card and historical data stay on the site; if PAYG turns out cheap enough to keep covering it, we'll revisit.

Alibaba: goodbye starting next week. The Model Studio Token Plan FAQ states verbatim: "This plan is for interactive use within compatible AI coding and agent tools only. It cannot be used for automated scripts or application backends." with penalty: "subscription suspension or API Key blocking." The legacy Coding Plan carries an equivalent clause ("programming tools only"). Our MSA bench is exactly what the clause forbids — an automated script, not a human-supervised interactive agent. The clause is not new (it has been in place for months), but after rereading it in cold blood this week we decided to honour it. W22 is Alibaba's last appearance on the board. The historical data stays for context; the provider card gets a "Retired (ToS)" badge. Thanks to forkline.dev for supplying the Alibaba API key that let us include them in the weekly bench up to this point. Bye bye Ali, hi Xiaomi.

Catalogue: what's in, what's out. Cerebras retired qwen-3-235b-a22b-instruct-2507 (HTTP 404 across the run). opencode-go rotated: +qwen3.7-max, -qwen3.5-plus. OpenRouter lost baidu/cobuddy:free — the W20 surprise pick — and the entire baidu/cobuddy family is gone from the catalogue, not just the free SKU (Baidu still ships Ernie 4.5 there). It also lost arcee-ai/trinity-large-thinking:free; the paid arcee-ai/trinity-large-thinking variant is still listed, so if you want that model the route exists, just not at $0. New: moonshotai/kimi-k2.6:free. OpenRouter stays filtered to :free SKUs only on our side — the paid routes duplicate providers we already bench directly.

Drift by (provider, model): the inference provider matters as much as the model. Same model, different provider, very different score. The findings: glm-5 on CrofAI scored 0.07 vs 0.941 on Alibaba and 0.908 on opencode-go — CrofAI served the broken variant (HTTP 200 + empty body on the precision endpoints). deepseek-v4-flash on opencode-go scored 0.547 vs 0.945 on CrofAI and 0.913 on DeepSeek direct — opencode-go consistently penalises DeepSeek. MiniMax-M2.5 on Alibaba: 0.919 vs 0.548 on MiniMax direct — this week Alibaba served M2.5 better than MiniMax themselves, inverting the usual reading. mimo-v2.5 on Xiaomi direct 0.919 vs opencode-go 0.725 — Xiaomi direct wins by 19pp. The takeaway: for popular open-source weights, who runs the inference matters as much as which model you pick. Stable across providers (≤8pp spread): glm-4.7, glm-5.1, mimo-v2.5-pro, qwen3.6-plus.

Raw speed. Cerebras holds the crown with zai-glm-4.7 at 521 tok/s end-to-end and 1,490 tok/s generation rate (TTFT 2.6s) — unbeatable on latency. The new nvidia/nemotron-3-nano family on OpenRouter :free pulls 200+ tok/s with sub-1s TTFT — a cheap fallback lane for non-critical work. CrofAI Lightning (kimi-k2.5-lightning) generates at 503 tok/s once it gets going, but the 10s TTFT kills the end-to-end number.

Discovered this week: MiniMax's official CLI mmx (github.com/MiniMax-AI/cli, MIT). Install: npm install -g mmx-cli. It is multimodal — text, image, video, speech, music, vision, web search — with non-interactive flags and JSON-schema export so agents (Cursor, Claude Code, opencode) can register mmx as a tool. Requires a MiniMax Token Plan (no PAYG path per the README). If you have the subscription it is worth trying — the opencode integration is direct.

Want a provider covered? Sponsor it. Synthetic is on hold this week for exactly that reason — an API key with modest quota, single-digit million tokens per month, is enough to bring it back. millaguie [at] gmail [dot] com.

▸ Previous notes
2026-W21 · TTFT lands and solves the DeepSeek mystery; CrofAI down, OpenAI retired

A methodology change and a few board moves this week.

We now measure time-to-first-token. Every speed benchmark streams and separates TTFT from total time; we report a generation rate (tok/s excluding the prefill/queue wait) alongside the end-to-end figure.

The DeepSeek 5,000+ tok/s mystery is solved — it was a measurement artifact. The old single-elapsed timing folded queue and prefill into the rate. With TTFT separated, DeepSeek lands at ~60-75 tok/s end-to-end. That closes the “more next week” note from the index.

What TTFT exposes. End-to-end tok/s was hiding large prefill waits: CrofAI's kimi-k2.5-lightning generates at ~6,500 tok/s once it starts but carries an 11.7s TTFT; the Synthetic and opencode-go GLMs lose 7-10s to prefill. Cerebras is the genuine speed leader — llama3.1-8b at 1,479 tok/s and gpt-oss-120b at 1,372, both with 0.2-0.3s TTFT.

CrofAI is off the board — a serving outage. For the last few hours CrofAI has been returning HTTP 200 with empty bodies across much of its catalogue (the -precision/-pro/-flash variants and the newer, bigger models: deepseek-v4-pro-precision 71% empty, gemma-4-31b-it 55%, glm-5.1-precision 45%). It also hit the model we use as our cost-optimized judge, so this week we re-judged the architecture suite against our reference judge, claude-opus-4-7. CrofAI comes back when serving stabilises — and yes, that moves it out of the pick too (it was last week's pick; when something breaks, we say so).

OpenAI is retired from the board — our call. We brought ChatGPT in for a few weeks to evaluate it and aren't convinced it earns a permanent column, so we're dropping it (the larger gpt-5.x and -codex models also need a Pro subscription we no longer keep). Want a provider covered? Sponsor it. Lend us an API key with modest quota — single-digit millions of tokens a month is enough — and we'll bench it under the same disclosure rules as everyone else. API-credit donations welcome: millaguie [at] gmail [dot] com.

Also out: Synthetic finished its retirement pass (DeepSeek V3.x, Llama 3/4, Qwen2.5-Coder now return 404); alibaba/qwen3.7-max didn't respond (errors across the run); Copilot stays free-tier only.

Quality (architect, judged by opus-4-7). Top of the board: claude-sonnet-4-5 (0.976) and claude-opus-4-7 (0.974), then kimi-k2.5 on Alibaba (0.967), Kimi-K2.6 on Synthetic (0.966) and claude-sonnet-4-6 (0.966). Note: switching the judge from kimi-k2.6-precision to opus-4-7 recalibrates absolute scores, so this week's architect numbers are not directly comparable to the W18-W20 trend line.

2026-W20 · opencode-go debuts behind, OpenAI off the board, baidu/cobuddy surprises

Five things landed this week.

opencode-go debuts — and falls 40% behind on its flagship. The opencode team shipped their own pay-as-you-go gateway (opencode.ai/zen/go/v1), a curated open-source catalogue of twelve models (DeepSeek V4, GLM 5.x, Kimi K2.5/K2.6, MiMo V2.5, MiniMax M2.5/M2.7, Qwen 3.5/3.6). Convenient — one key for what opencode itself uses — but on quality benches it trails the direct providers it routes to. Our drift detector flagged deepseek-v4-pro on opencode-go at -40 % vs DeepSeek direct (critical, z=-2.20), kimi-k2.6 at -20 % vs CrofAI, and minimax-m2.7 at -9 % vs MiniMax direct. The lighter models fare better (minimax-m2.5 placed #7 in the combined ranking), but the routing markup hits the heavy reasoners disproportionately. Useful as a convenience layer; not a replacement for the upstreams.

OpenAI is off the bench this week. Our ChatGPT Pro subscription — the one we measured the gpt-5.x family against — lapsed before W20 started, and several of the larger models (gpt-5.5, gpt-5.5-fast, all -codex variants) require Pro to access. Rather than publish a half-measured OpenAI column, we pulled it entirely for the week. It comes back when we either renew or switch to a different auth flow that survives the Pro lapse. Historical W18-W19 OpenAI data on the dashboard stays.

GitHub Copilot, also off — but for editorial reasons. Microsoft's recent platform direction has made it less interesting to bench as a coding-agent option (model availability per plan, rate-limit changes, and the new opt-outs all moving in directions that erode what made Copilot worth the column). The auth is still wired and we keep the historical numbers, but we're not running Copilot in the weekly bench until that picture stabilises.

Catalogue churn. CrofAI added four models this week (deepseek-v4-flash, deepseek-v4-pro-precision, mimo-v2.5-pro, mimo-v2.5-pro-precision) and retired qwen3.5-9b-chat. The Xiaomi MiMo line lands well: mimo-v2.5-pro-precision sits #11 in the architect rankings at 0.972, holding its own against Qwen and GLM at similar tok/s. Synthetic, meanwhile, ran a big retirement pass — fourteen models now return HTTP 404 "no longer supported", including the Llama 3.x/4 family, DeepSeek V3 through V3.2, Qwen2.5-Coder, MiniMax-M2/M2.1, the Kimi-K2-Instruct/Thinking pair and GLM-4.6. Their catalogue is consolidating around current SKUs only.

baidu/cobuddy:free is the genuine surprise. Topped the architect rankings outright (0.989), placed mid-table in developer (0.884), and clocked a 499 tok/s median throughput — all with no per-token cost on OpenRouter's free tier. We've eyeballed a handful of its responses and they hold up; this isn't the usual :free tradeoff of "works until it doesn't." If you need a second lane behind your paid daily driver, this is currently the strongest free option on the board.

How opencode-go and Alibaba Cloud got onto the bench. Both providers are on the leaderboard because somebody at — or with — the provider sent us an API key with enough quota to run the standard weekly suite. We will do the same for any provider we're not currently covering. The deal stays the same as for everyone else already on the board: we wire the auth, we publish the numbers (the unflattering ones included), and we annotate the methodology so readers can reproduce. If you can spot us a key — single-digit millions of tokens per month is enough — write to millaguie [at] gmail [dot] com or open an issue on the repo.

Heads-up on Alibaba Cloud. Starting next month, Alibaba's updated Terms of Service will rule out the kind of access this benchmark relies on. We expect to lose the Alibaba column from the next monthly cycle onward — keys revoked, numbers frozen. The historical W18-W20 Alibaba data on the dashboard stays so the trend lines remain intact, but the weekly leaderboard will no longer include them once the new ToS takes effect. If Alibaba shipped a compliant path for third-party benches before then, we'd switch to it; right now there isn't one.

2026-W19 · Catalog grew, OpenRouter joined the party

Three things landed this week.

OpenRouter as a new front door. 187 models in their catalogue, 25 of them :free text models. With $10+ in credit you cross into the higher daily quota tier (~1000 free-model requests/day), making them viable for both speed and quality benches. Two surprises in the speed pass: nvidia/nemotron-3-nano-30b-a3b:free clocked 4177 tok/s and poolside/laguna-xs.2:free hit 3746 tok/s — both in Cerebras territory, unusual for OpenRouter-routed inference.

Catalogue refresh. Most providers shipped new SKUs since W18: Claude added Sonnet 4.5; Copilot opened up the gpt-5.x family plus grok-code-fast-1; Alibaba's Qwen3.6 lineup is in; Cerebras now serves zai-glm-4.7; CrofAI gained five Kimi and qwen3.5 variants; OpenAI's gpt-5.5 family and several gpt-5.4 variants are tested for the first time.

Alibaba opened a second door — the Token Plan (Team Edition). After months of the legacy Coding Plan being effectively unobtainable, Alibaba shipped a parallel Token Plan that you can actually buy. Three tiers per seat: Standard $30/mo (25k Credits), Pro $100/mo (100k Credits, their recommended tier), Max $200/mo (250k Credits). Floor is cheaper than the legacy $50 Pro, but the closest equivalent is now $100 — a 2× hike at the volume most users actually want.

Credits vs requests — what really changed. The legacy Coding Plan billed by request count (one HTTP call to the endpoint). A single user query in an agent like opencode or Claude Code typically expands into 5-30 requests internally — planning, tool calls, re-prompts — so 90k requests/mo at $50 actually meant a few thousand user-visible queries. The Token Plan bills by Credits derived from input + cached + output tokens, modulated by model, thinking mode and tool calls. Alibaba doesn't publish the credit-to-token ratio, so budgeting precisely is impossible without measuring your own workload first. Practical effect: request-based subsidised thinking-mode and long-output models (they cost the same as a one-shot reply); credit-based makes you pay for those tokens proportionally. Short verbose-tool-using agents are better off on Credits; long-context, deep-reasoning, terse-output workloads were better off on the old request quota — which you can't buy anymore.

Catch: the Token Plan ToS restrict it to interactive use with compatible AI coding/agent tools only. What the FAQ explicitly forbids: automated scripts, application backends, benchmarking and research scripts — violations may trigger API key revocation. What stays allowed: long agentic sessions inside a compatible tool (Claude Code, opencode, Qwen Code, etc.) with a human in the loop, even if the agent itself runs autonomously for hours. The line is "is there a person driving a compatible tool?", not session duration. Our weekly bench against Alibaba runs against a legacy Coding Plan subscription where this clause didn't exist; new subscribers on the Token Plan should keep usage inside one of those compatible agents.

A note on :free quotas. OpenRouter's free tier resets daily at midnight UTC, but the per-model bottleneck is usually the upstream provider, not OpenRouter itself. Several Meta-llama and Mistral-derived variants returned HTTP 429s all week and never recovered across 11 retry attempts — those are upstream pool limits, not your account's. Cross-provider drift comparisons (openai/gpt-oss-120b OpenRouter vs Synthetic; z-ai/glm-4.5-air OpenRouter vs zai) are now possible and arguably the real new value here.

Speed per dollar

Best value subscriptions

Peak tok/s ÷ cheapest available plan price. Pay-as-you-go-only providers are excluded — speed-per-subscription-dollar is undefined for them.

#
Provider · best model on plan
Peak
Plan
tok/s per $/mo
Sign up
#1
CrofAI · kimi-k2.5-lightning
614.4
$5Hobby
122.9
#2
Minimax · MiniMax-M2.1
167.0
$10Starter
16.7
#3
Synthetic · hf:moonshotai/Kimi-K2.6
224.2
$30Subscription
7.5
#4
OpenAI · gpt-5.4-mini
142.2
$20Plus
7.1
#5
Claude · claude-haiku-4-5-20251001
119.8
$20Pro
6.0
#6
Copilot · gpt-5-mini
112.5
$19Business
5.9
#7
z.ai · glm-4.5-air
100.9
$18Lite
5.6
#8
Alibaba · qwen3-coder-next
146.6
$30Token Plan · Standard Seat
4.9
Read this carefully. "Value" here measures the cheapest plan's peak throughput, not its quota. A $10 plan with 100 prompts/5h beats a $50 plan only if that quota fits your workload. Always cross-check the per-provider quota column below.
Free at this throughput

Free-tier models

Models that cost $0 per use — either because the provider's cheapest available plan is free (Copilot Free, OpenRouter free tier) or because the SKU itself is marked free in the catalogue. Listed separately because speed-per-$ is undefined at $0 and would skew the value ranking above. Quotas and rate-limits are tight — useful as a fallback lane, not as a primary subscription.

OpenRouter nvidia/nemotron-3-nano-30b-a3b:free 4177.4 tok/s Free plan
OpenRouter poolside/laguna-xs.2:free 3746.4 tok/s Free plan
OpenRouter liquid/lfm-2.5-1.2b-thinking:free 2416.9 tok/s Free plan
OpenRouter openai/gpt-oss-20b:free 1737.2 tok/s Free plan
OpenRouter liquid/lfm-2.5-1.2b-instruct:free 1115.9 tok/s Free plan
OpenRouter nvidia/nemotron-nano-9b-v2:free 1018.1 tok/s Free plan
OpenRouter nvidia/nemotron-nano-12b-v2-vl:free 946.8 tok/s Free plan
OpenRouter google/gemma-4-31b-it:free 945.2 tok/s Free plan

… and 53 more free-tier model(s) measured.

Pure throughput

Models by tok/s

Output tokens per wall-clock second. Filter by provider, change the sort, or expand to all measured models.

#
Provider · model
Best tok/s
Throughput
Runs
OK
#1
DeepSeek deepseek-chat
7329.9
7329.9 tok/s
3
100%
#2
7097.5
7097.5 tok/s
3
100%
#3
6263.3
6263.3 tok/s
3
100%
#4
DeepSeek deepseek-reasoner
5931.5
5931.5 tok/s
3
100%
#5
OpenRouter nvidia/nemotron-3-nano-30b-a3b:free
4177.4
4177.4 tok/s
1
100%
#6
OpenRouter poolside/laguna-xs.2:free
3746.4
3746.4 tok/s
1
100%
#7
OpenRouter liquid/lfm-2.5-1.2b-thinking:free
2416.9
2416.9 tok/s
1
100%
#8
OpenRouter openai/gpt-oss-20b:free
1737.2
1737.2 tok/s
1
100%
#9
Cerebras llama3.1-8b
1500.7
1500.7 tok/s
3
100%
#10
OpenRouter liquid/lfm-2.5-1.2b-instruct:free
1115.9
1115.9 tok/s
1
100%
#11
OpenRouter nvidia/nemotron-nano-9b-v2:free
1018.1
1018.1 tok/s
1
100%
#12
OpenRouter nvidia/nemotron-nano-12b-v2-vl:free
946.8
946.8 tok/s
1
100%
#13
OpenRouter google/gemma-4-31b-it:free
945.2
945.2 tok/s
2
50%
#14
OpenRouter minimax/minimax-m2.5:free
857.1
857.1 tok/s
2
50%
#15
Cerebras qwen-3-235b-a22b-instruct-2507
794.2
794.2 tok/s
3
100%
#16
614.4
614.4 tok/s
3
100%
#17
OpenRouter openai/gpt-oss-120b:free
332.9
332.9 tok/s
1
100%
#18
OpenRouter google/gemma-4-26b-a4b-it:free
284.5
284.5 tok/s
3
33%
#19
224.2
224.2 tok/s
2
100%
#20
190.7
190.7 tok/s
3
100%
#21
OpenRouter z-ai/glm-4.5-air:free
187.8
187.8 tok/s
1
100%
#22
CrofAI qwen3.5-9b
174.7
174.7 tok/s
3
100%
#23
173.7
173.7 tok/s
3
100%
#24
168.4
168.4 tok/s
3
100%
#25
Minimax MiniMax-M2.1
167.0
167.0 tok/s
3
100%
#26
159.0
159.0 tok/s
3
100%
#27
CrofAI qwen3.5-9b-chat
158.0
158.0 tok/s
3
100%
#28
CrofAI qwen3.6-27b
150.7
150.7 tok/s
3
100%
#29
Alibaba qwen3-coder-next
146.6
146.6 tok/s
3
100%
#30
OpenAI gpt-5.4-mini
142.2
142.2 tok/s
3
100%
#31
Synthetic hf:nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4
142.0
142.0 tok/s
3
100%
#32
OpenAI gpt-5.4-mini-fast
141.2
141.2 tok/s
3
100%
#33
Synthetic hf:deepseek-ai/DeepSeek-R1-0528
135.2
135.2 tok/s
3
100%
#34
Synthetic hf:deepseek-ai/DeepSeek-R1
131.2
131.2 tok/s
3
100%
#35
130.0
130.0 tok/s
3
100%
#36
129.5
129.5 tok/s
3
100%
#37
Synthetic hf:openai/gpt-oss-120b
127.3
127.3 tok/s
3
100%
#38
126.2
126.2 tok/s
3
33%
#39
Claude claude-haiku-4-5-20251001
119.8
119.8 tok/s
3
100%
#40
114.0
114.0 tok/s
3
100%
#41
113.2
113.2 tok/s
3
33%
#42
Copilot gpt-5-mini
112.5
112.5 tok/s
3
100%
#43
109.6
109.6 tok/s
3
100%
#44
108.0
108.0 tok/s
3
100%
#45
Copilot gemini-3-flash-preview
107.9
107.9 tok/s
3
100%
#46
105.8
105.8 tok/s
3
100%
#47
105.1
105.1 tok/s
3
33%
#48
CrofAI greg
103.8
103.8 tok/s
3
33%
#49
102.8
102.8 tok/s
3
100%
#50
CrofAI gemma-4-31b-it
101.2
101.2 tok/s
3
100%
#51
z.ai glm-4.5-air
100.9
100.9 tok/s
2
100%
#52
Copilot claude-haiku-4.5
100.5
100.5 tok/s
3
100%
#53
Synthetic hf:meta-llama/Llama-3.3-70B-Instruct
99.7
99.7 tok/s
3
100%
#54
Synthetic hf:nvidia/Kimi-K2.5-NVFP4
98.1
98.1 tok/s
3
33%
#55
93.0
93.0 tok/s
3
100%
#56
Copilot gpt-4.1
92.5
92.5 tok/s
3
100%
#57
Copilot gpt-4o
90.8
90.8 tok/s
3
100%
#58
Synthetic hf:Qwen/Qwen3-Coder-480B-A35B-Instruct
90.8
90.8 tok/s
3
100%
#59
Synthetic hf:deepseek-ai/DeepSeek-V3
88.0
88.0 tok/s
3
100%
#60
80.7
80.7 tok/s
3
100%
#61
80.6
80.6 tok/s
3
100%
#62
77.9
77.9 tok/s
2
100%
#63
Copilot gemini-2.5-pro
75.3
75.3 tok/s
3
100%
#64
74.1
74.1 tok/s
3
100%
#65
Copilot grok-code-fast-1
73.5
73.5 tok/s
3
100%
#66
72.2
72.2 tok/s
3
100%
#67
Minimax MiniMax-M2
71.3
71.3 tok/s
3
100%
#68
68.9
68.9 tok/s
3
100%
#69
Alibaba qwen3.5-plus
64.1
64.1 tok/s
3
100%
#70
Copilot gemini-3.1-pro-preview
62.4
62.4 tok/s
3
100%
#71
62.2
62.2 tok/s
3
100%
#72
OpenAI gpt-5.5-fast
61.0
61.0 tok/s
3
100%
#73
59.6
59.6 tok/s
3
100%
#74
Alibaba qwen3-coder-plus
59.5
59.5 tok/s
3
100%
#75
Copilot claude-sonnet-4.5
59.2
59.2 tok/s
3
100%
#76
57.8
57.8 tok/s
3
100%
#77
Claude claude-sonnet-4-6
56.7
56.7 tok/s
2
100%
#78
53.7
53.7 tok/s
2
100%
#79
Alibaba qwen3.6-plus
53.4
53.4 tok/s
3
100%
#80
Claude claude-opus-4-7
53.2
53.2 tok/s
2
100%
#81
53.2
53.2 tok/s
3
100%
#82
52.7
52.7 tok/s
3
33%
#83
OpenAI gpt-5.3-codex
52.7
52.7 tok/s
3
100%
#84
OpenAI gpt-5.4-fast
52.4
52.4 tok/s
3
100%
#85
Copilot claude-sonnet-4.6
52.2
52.2 tok/s
3
100%
#86
OpenAI gpt-5.4
51.9
51.9 tok/s
3
100%
#87
Copilot claude-sonnet-4
51.2
51.2 tok/s
1
100%
#88
OpenAI gpt-5.5
50.5
50.5 tok/s
3
100%
#89
Claude claude-opus-4-5
49.9
49.9 tok/s
2
100%
#90
49.9
49.9 tok/s
2
100%
#91
Claude claude-sonnet-4-5
49.5
49.5 tok/s
2
100%
#92
48.8
48.8 tok/s
3
100%
#93
48.1
48.1 tok/s
3
100%
#94
Minimax MiniMax-M2.7-highspeed
45.0
45.0 tok/s
3
100%
#95
40.6
40.6 tok/s
3
100%
#96
Alibaba qwen3-max-2026-01-23
32.1
32.1 tok/s
3
100%
#97
OpenRouter nvidia/nemotron-3-nano-omni-30b-a3b-reasoning:free
29.1
29.1 tok/s
1
100%
#98
Minimax MiniMax-M2.7
26.4
26.4 tok/s
3
100%
#99
OpenRouter poolside/laguna-m.1:free
3.5
3.5 tok/s
1
100%

A note on the DeepSeek numbers: we're still investigating whether they're real. We've run more tests than the ones published here and they all land in the same range, so it's not a one-off measurement glitch. More info next week. Bars use a log scale so the smaller subscription numbers stay legible alongside Cerebras' genuinely-sustained 1,000–2,600 tok/s.

Speed over time

tok/s progression by model

Best output tok/s recorded per model and provider, per measurement date. Only models measured on 2+ dates are shown.

All plans, side by side

Subscription matrix

Every coding subscription tier we know about, with measured peak speed and current availability. The cheapest available plan on each provider is highlighted.

Cerebras

⚠ Waitlist only
  • Code Pro $50/mo~24M tokens/day
  • Code Max $200/mo~120M tokens/day
Peak1500.7 tok/s
WireOpenAI-compatible
Value— PAYG only
Heads up — waitlist only
Waitlist only. Code Pro and Code Max have not been re-opened to new customers — Cerebras has kept them on an indefinite waitlist since the initial rollout sold out. The PAYG free tier (1M tokens/day) and pay-per-token usage are unaffected.

CrofAI

⚠ Subs reworked — likely off bench W23
  • Free / PAYG $0/moPay-per-token, no recurring charge
  • Hobby BEST VALUE$5/mo500 daily requests · access to all models
  • Pro $10/mo1,000 daily requests · priority support
  • Intermediate $20/mo2,500 daily requests
  • Scale $50/mo7,500 daily requests
  • Max $100/mo15,000 daily requests
Peak614.4 tok/s
WireOpenAI-compatible
Value122.9 / $
Heads up — <strong>Subscriptions are being reworked (announced 2026-05-31) — likely off the weekly bench from 2026-W23
Subscriptions are being reworked (announced 2026-05-31) — likely off the weekly bench from 2026-W23. CrofAI says flat-rate subs are no longer sustainable. Either subs convert to credit-based plans worth 1.15× their PAYG value (a $5 Hobby sub becomes $5.75 of credit/month) or they are removed entirely in favour of rock-bottom PAYG. Existing subs keep working until they end or renew; the change rolls out 2026-05-31 through ~06-03. The flat, unlimited $5 Hobby plan that made CrofAI our volume daily driver is going away, so under metered PAYG we expect to stop running the full weekly suite against it. Historical data stays on this page for context.

Minimax

⚠ Highspeed gating
  • Starter CHEAPEST$10/mo100 prompts / 5h, M2.5
  • Plus $20/mo300 prompts / 5h, M2.5
  • Max $50/mo1,000 prompts / 5h, M2.5
  • Plus-Highspeed $40/mo300 prompts / 5h, Lightning
  • Max-Highspeed $80/mo1,000 prompts / 5h, Lightning
  • Ultra-Highspeed $150/mo2,000 prompts / 5h, Lightning
Peak167.0 tok/s
WireAnthropic-compatible
Value16.7 / $
Heads up — the Lightning (high-speed, ~100 tok/s) variant of M2
The Lightning (high-speed, ~100 tok/s) variant of M2.5 is gated to the -Highspeed tiers only (Plus-Highspeed $40/mo, Max-Highspeed $80/mo, Ultra-Highspeed $150/mo). The cheaper Starter / Plus / Max plans give you regular M2.5 at ~50 tok/s — make sure you subscribe to a -Highspeed tier if you specifically need Lightning.

z.ai

⚠ Price hikes
  • Lite CHEAPEST$18/mo400 prompts / 5h, 2,000 / week
  • Pro $36/mo2,000 prompts / 5h, unlimited weekly
  • Max $96/moNo practical cap, peak-hour SLA
Peak100.9 tok/s
WireOpenAI-compatible · Anthropic-compatible
Value5.6 / $
Heads up — aggressive 2026 price hikes
Aggressive 2026 price hikes. The Lite plan launched in February 2026 around $3/mo and has been moved up several times since — currently $18/mo (≈$30/quarter). That puts it within a few dollars of Claude Pro ($20/mo). Marketing still claims '3× Claude Pro usage', but that figure is vendor-supplied and based on z.ai's own quota model, not an apples-to-apples measurement. Verify the latest pricing on z.ai/subscribe before subscribing.
  • Plus CHEAPEST$20/moStandard ChatGPT + Codex access
  • Pro $200/moHigher quotas + research tier
Peak142.2 tok/s
WireCodex OAuth (SSE)
Value7.1 / $
  • Pro CHEAPEST$20/moClaude Code with shared Pro limits
  • Max $100/mo5× Pro quota
  • Max+ $200/mo20× Pro quota + Guest Passes
Peak119.8 tok/s
WireClaude Code OAuth (Anthropic /v1/messages + oauth-2025-04-20)
Value6.0 / $

Copilot

⚠ Pro signups paused
  • Free $0/mo2,000 completions / 50 premium requests per month
  • Pro $10/moHigher limits, Sonnet/GPT-5 (no Opus)
  • Pro+ $39/moIncludes Claude Opus 4.7 + premium models
  • Business CHEAPEST$19/moPer seat — admin controls, audit logs
  • Enterprise $39/moPer seat (+ $21 GH Enterprise Cloud)
Peak112.5 tok/s
WireCopilot Bearer (OpenAI-compat /chat/completions)
Value5.9 / $
Heads up — gitHub paused <strong>new sign-ups</strong> for the Pro, Pro+ and Student tiers on 2026-04-20 — citing that agentic workloads consume far more compute than the original pricing assumed
GitHub paused new sign-ups for the Pro, Pro+ and Student tiers on 2026-04-20 — citing that agentic workloads consume far more compute than the original pricing assumed. Existing Pro/Pro+ subscribers keep their plan; new individual users can only pick Free, Business or Enterprise. Opus 4.x has also been removed from Pro — only Pro+ keeps it.

Alibaba

⚠ Retired (ToS) after 2026-W22
  • Token Plan · Standard Seat CHEAPEST$30/mo25,000 credits/mo · text, vision, image gen — interactive use only
  • Token Plan · Pro Seat $100/mo100,000 credits/mo (4× Standard) — Alibaba's recommended tier
  • Token Plan · Max Seat $200/mo250,000 credits/mo (10× Standard)
  • Coding Plan · Lite (legacy) $10/mo18,000 requests/mo — closed to new subs since 2026-03-20
  • Coding Plan · Pro (legacy) $50/mo90,000 requests/mo — effectively impossible to buy
Peak146.6 tok/s
WireOpenAI-compatible
Value4.9 / $
Heads up — <strong>Retired from the weekly bench after 2026-W22 — ToS incompatibility
Retired from the weekly bench after 2026-W22 — ToS incompatibility. Both the Token Plan and the legacy Coding Plan FAQs restrict use to interactive AI coding/agent tools driven by a human, explicitly forbidding automated scripts, application backends and benchmarking. MSA's bench is exactly the script-driven workload the clause prohibits, so we will stop running it against Alibaba from 2026-W23 onwards. Historical data through 2026-W22 stays on this page for context. Thanks to forkline.dev for the API key that let us cover Alibaba up to this point.
  • Free tier $0/mo~50 `:free` requests/day (rises to ~1000/day with $10+ credit)
  • Pay-as-you-go $0/moPer-token pricing on non-free models · deposit any amount
Peak4177.4 tok/s
WireOpenAI-compatible
Value— PAYG only

Synthetic

⚠ Paused 2026-W22
  • Subscription CHEAPEST$30/mo500 messages / 5h · all models included · 1 concurrent req/model
  • Usage-based $0/moPay-per-token · all models
Peak224.2 tok/s
WireOpenAI-compatible
Value7.5 / $
Heads up — <strong>Not benched in 2026-W22
Not benched in 2026-W22. Our Synthetic subscription lapsed and we chose not to renew. The provider stays listed because the page preserves historical context, but no fresh numbers will appear until the bench resumes. Want Synthetic covered? Sponsor a key — see About / sources or email millaguie [at] gmail [dot] com.

PAYG-only providers

No subscription tier — pay per token. Fast for sporadic use, harder to budget for daily agent loops.

DeepSeek

Chinese AI lab known for open-weight MoE models that punch above their price.

7329.9
Peak tok/s
How to read this

What we measured (and what we didn't)

Throughput is end-to-end — output tokens divided by wall-clock time, including network and queueing. Quality is not measured — that needs evals, not stopwatches. Tail latency is not measured — averages dominate; p99 would need many more samples. Same prompt, every model the provider exposes, weekly refresh. Read the full methodology →