Which LLM coding subscription is actually fast?

Real-world throughput for every model on every coding-agent subscription we could get our hands on. Same prompt, same client, every model the provider exposes. Updated weekly.

11 providers 95 models on subscriptions 320 successful runs 409,550 output tokens measured
7,394
Peak tok/s · DeepSeek deepseek-v4-flash
122.9
Best tok/s per $/mo · CrofAI Hobby
$5
Cheapest available coding sub
5 / 11
Providers with availability caveats
If you're picking a sub today

Two picks by budget

The fastest available subscription tier in each price bracket — based on the peak tok/s we measured on each provider's plan. Buckets with no available subscription are skipped.

Budget tier≤ $20 / mo

CrofAI Hobby

CrofAI · kimi-k2.5-lightning

Aggregator that resells open-source models behind one cheap key. Cheapest available plan at $5/mo (Hobby); peak 614.4 tok/s on kimi-k2.5-lightning.

Peak
614.4 tok/s
Value
122.9 / $
Reliability
View benchmarks → BEST VALUE
Mid tier$20 – $50 / mo

Synthetic Subscription

Synthetic · hf:moonshotai/Kimi-K2.6

Run open-source AI models privately — flat $30/mo or pay-per-token. Cheapest available plan at $30/mo (Subscription); peak 224.2 tok/s on hf:moonshotai/Kimi-K2.6.

Peak
224.2 tok/s
Value
7.5 / $
Reliability
View benchmarks →
Editor's pick · 2026-W19

Millaguie's pick of the week

CrofAI takes the top spot. Flat pricing, no promotional windows to time, and an eighteen-model catalogue that grew again this week (all the Kimi K2 variants plus the GLM 5.x family are in). Two ways to play it. CrofAI + MiniMax if you want absolute strongest pair-for-the-price: lean on MiniMax for the heavy, hairy problems where raw capability matters, fall back to CrofAI for everything else and you get latency that beats most official providers. CrofAI alone on a larger plan if you'd rather skip the second invoice — the lightweight models (kimi-k2.5-lightning, glm-4.7-flash) absorb the bulk-throughput slot at speeds the bigger official endpoints don't match, and the precision variants are there when you need them. And if your use is genuinely occasional — a query here, a script there — DeepSeek pay-as-you-go wins outright: $0.14/$0.28 per Mtok across the chat/reasoner lineup is so cheap that committing to any plan would be over-engineering.
▸ Previous picks
2026-W18DeepSeek V4-pro + MiniMax. DeepSeek ran a 75 % launch discount and MiniMax's pricing was already competitive — best cost-per-quality pairing on the board until the discount expired.
2026-W17DeepSeek + MiniMax for the win. Even on pay-as-you-go, DeepSeek's pricing is low enough to use it as a reasoning escape hatch when MiniMax gets stuck in a rabbit hole — without committing to a premium subscription.
Weekly notes · 2026-W19

Catalog grew, OpenRouter joined the party

Three things landed this week.

OpenRouter as a new front door. 187 models in their catalogue, 25 of them :free text models. With $10+ in credit you cross into the higher daily quota tier (~1000 free-model requests/day), making them viable for both speed and quality benches. Two surprises in the speed pass: nvidia/nemotron-3-nano-30b-a3b:free clocked 4177 tok/s and poolside/laguna-xs.2:free hit 3746 tok/s — both in Cerebras territory, unusual for OpenRouter-routed inference.

Catalogue refresh. Most providers shipped new SKUs since W18: Claude added Sonnet 4.5; Copilot opened up the gpt-5.x family plus grok-code-fast-1; Alibaba's Qwen3.6 lineup is in; Cerebras now serves zai-glm-4.7; CrofAI gained five Kimi and qwen3.5 variants; OpenAI's gpt-5.5 family and several gpt-5.4 variants are tested for the first time.

Alibaba opened a second door — the Token Plan (Team Edition). After months of the legacy Coding Plan being effectively unobtainable, Alibaba shipped a parallel Token Plan that you can actually buy. Three tiers per seat: Standard $30/mo (25k Credits), Pro $100/mo (100k Credits, their recommended tier), Max $200/mo (250k Credits). Floor is cheaper than the legacy $50 Pro, but the closest equivalent is now $100 — a 2× hike at the volume most users actually want.

Credits vs requests — what really changed. The legacy Coding Plan billed by request count (one HTTP call to the endpoint). A single user query in an agent like opencode or Claude Code typically expands into 5-30 requests internally — planning, tool calls, re-prompts — so 90k requests/mo at $50 actually meant a few thousand user-visible queries. The Token Plan bills by Credits derived from input + cached + output tokens, modulated by model, thinking mode and tool calls. Alibaba doesn't publish the credit-to-token ratio, so budgeting precisely is impossible without measuring your own workload first. Practical effect: request-based subsidised thinking-mode and long-output models (they cost the same as a one-shot reply); credit-based makes you pay for those tokens proportionally. Short verbose-tool-using agents are better off on Credits; long-context, deep-reasoning, terse-output workloads were better off on the old request quota — which you can't buy anymore.

Catch: the Token Plan ToS restrict it to interactive use with compatible AI coding/agent tools only. What the FAQ explicitly forbids: automated scripts, application backends, benchmarking and research scripts — violations may trigger API key revocation. What stays allowed: long agentic sessions inside a compatible tool (Claude Code, opencode, Qwen Code, etc.) with a human in the loop, even if the agent itself runs autonomously for hours. The line is "is there a person driving a compatible tool?", not session duration. Our weekly bench against Alibaba runs against a legacy Coding Plan subscription where this clause didn't exist; new subscribers on the Token Plan should keep usage inside one of those compatible agents.

A note on :free quotas. OpenRouter's free tier resets daily at midnight UTC, but the per-model bottleneck is usually the upstream provider, not OpenRouter itself. Several Meta-llama and Mistral-derived variants returned HTTP 429s all week and never recovered across 11 retry attempts — those are upstream pool limits, not your account's. Cross-provider drift comparisons (openai/gpt-oss-120b OpenRouter vs Synthetic; z-ai/glm-4.5-air OpenRouter vs zai) are now possible and arguably the real new value here.

Speed per dollar

Best value subscriptions

Peak tok/s ÷ cheapest available plan price. Pay-as-you-go-only providers are excluded — speed-per-subscription-dollar is undefined for them.

#
Provider · best model on plan
Peak
Plan
tok/s per $/mo
Sign up
#1
CrofAI · kimi-k2.5-lightning
614.4
$5Hobby
122.9
#2
Minimax · MiniMax-M2.5
185.3
$10Starter
18.5
#3
Synthetic · hf:moonshotai/Kimi-K2.6
224.2
$30Subscription
7.5
#4
OpenAI · gpt-5.4-mini
146.8
$20Plus
7.3
#5
Claude · claude-haiku-4-5-20251001
119.8
$20Pro
6.0
#6
Copilot · gpt-5-mini
113.3
$19Business
6.0
#7
z.ai · glm-4.5-air
100.9
$18Lite
5.6
#8
Alibaba · qwen3-coder-next
146.6
$30Token Plan · Standard Seat
4.9
Read this carefully. "Value" here measures the cheapest plan's peak throughput, not its quota. A $10 plan with 100 prompts/5h beats a $50 plan only if that quota fits your workload. Always cross-check the per-provider quota column below.
Pure throughput

Models by tok/s

Output tokens per wall-clock second. Filter by provider, change the sort, or expand to all measured models.

#
Provider · model
Best tok/s
Throughput
Runs
OK
#1
DeepSeek deepseek-v4-flash
7394.8
7394.8 tok/s
4
100%
#2
DeepSeek deepseek-chat
7329.9
7329.9 tok/s
4
100%
#3
DeepSeek deepseek-v4-pro
7056.0
7056.0 tok/s
4
100%
#4
DeepSeek deepseek-reasoner
6136.8
6136.8 tok/s
4
100%
#5
OpenRouter nvidia/nemotron-3-nano-30b-a3b:free
4177.4
4177.4 tok/s
1
100%
#6
OpenRouter poolside/laguna-xs.2:free
3746.4
3746.4 tok/s
1
100%
#7
OpenRouter liquid/lfm-2.5-1.2b-thinking:free
2416.9
2416.9 tok/s
1
100%
#8
OpenRouter openai/gpt-oss-20b:free
1737.2
1737.2 tok/s
1
100%
#9
Cerebras llama3.1-8b
1500.7
1500.7 tok/s
4
100%
#10
OpenRouter liquid/lfm-2.5-1.2b-instruct:free
1115.9
1115.9 tok/s
1
100%
#11
OpenRouter nvidia/nemotron-nano-9b-v2:free
1018.1
1018.1 tok/s
1
100%
#12
OpenRouter nvidia/nemotron-nano-12b-v2-vl:free
946.8
946.8 tok/s
1
100%
#13
OpenRouter google/gemma-4-31b-it:free
945.2
945.2 tok/s
2
50%
#14
OpenRouter minimax/minimax-m2.5:free
857.1
857.1 tok/s
2
50%
#15
Cerebras qwen-3-235b-a22b-instruct-2507
794.2
794.2 tok/s
4
75%
#16
CrofAI kimi-k2.5-lightning
614.4
614.4 tok/s
4
100%
#17
OpenRouter openai/gpt-oss-120b:free
332.9
332.9 tok/s
1
100%
#18
OpenRouter google/gemma-4-26b-a4b-it:free
284.5
284.5 tok/s
3
33%
#19
Synthetic hf:moonshotai/Kimi-K2.6
224.2
224.2 tok/s
2
100%
#20
Synthetic hf:Qwen/Qwen3.5-397B-A17B
190.7
190.7 tok/s
4
100%
#21
OpenRouter z-ai/glm-4.5-air:free
187.8
187.8 tok/s
1
100%
#22
Minimax MiniMax-M2.5
185.3
185.3 tok/s
4
100%
#23
CrofAI qwen3.5-9b
174.7
174.7 tok/s
4
100%
#24
Synthetic hf:zai-org/GLM-4.7
173.7
173.7 tok/s
4
100%
#25
CrofAI qwen3.6-27b
167.3
167.3 tok/s
4
100%
#26
Minimax MiniMax-M2.1
167.0
167.0 tok/s
4
100%
#27
Synthetic hf:zai-org/GLM-5.1
164.1
164.1 tok/s
4
100%
#28
CrofAI qwen3.5-9b-chat
158.0
158.0 tok/s
4
100%
#29
Synthetic hf:zai-org/GLM-4.7-Flash
150.3
150.3 tok/s
4
100%
#30
OpenAI gpt-5.4-mini
146.8
146.8 tok/s
4
100%
#31
Alibaba qwen3-coder-next
146.6
146.6 tok/s
4
75%
#32
Synthetic hf:nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4
142.2
142.2 tok/s
4
100%
#33
OpenAI gpt-5.4-mini-fast
141.2
141.2 tok/s
4
100%
#34
Synthetic hf:deepseek-ai/DeepSeek-R1-0528
135.2
135.2 tok/s
4
100%
#35
Synthetic hf:openai/gpt-oss-120b
132.6
132.6 tok/s
4
100%
#36
Synthetic hf:deepseek-ai/DeepSeek-R1
131.2
131.2 tok/s
4
100%
#37
Synthetic hf:MiniMaxAI/MiniMax-M2.5
129.5
129.5 tok/s
4
100%
#38
Claude claude-haiku-4-5-20251001
119.8
119.8 tok/s
4
100%
#39
CrofAI kimi-k2.5
115.1
115.1 tok/s
4
100%
#40
Copilot gpt-5-mini
113.3
113.3 tok/s
4
100%
#41
CrofAI glm-4.7-flash
113.2
113.2 tok/s
4
50%
#42
Synthetic hf:zai-org/GLM-5
110.8
110.8 tok/s
4
100%
#43
CrofAI glm-5.1
108.0
108.0 tok/s
4
100%
#44
Copilot gemini-3-flash-preview
107.9
107.9 tok/s
4
100%
#45
CrofAI kimi-k2.6
105.8
105.8 tok/s
4
100%
#46
CrofAI glm-5
105.1
105.1 tok/s
4
50%
#47
CrofAI gemma-4-31b-it
104.9
104.9 tok/s
4
100%
#48
Copilot claude-haiku-4.5
104.8
104.8 tok/s
4
100%
#49
CrofAI greg
103.8
103.8 tok/s
4
50%
#50
CrofAI minimax-m2.5
102.8
102.8 tok/s
4
100%
#51
z.ai glm-4.5-air
100.9
100.9 tok/s
3
100%
#52
CrofAI glm-4.7
99.5
99.5 tok/s
4
100%
#53
Synthetic hf:meta-llama/Llama-3.3-70B-Instruct
93.9
93.9 tok/s
4
100%
#54
Copilot gpt-4o
93.1
93.1 tok/s
4
100%
#55
CrofAI deepseek-v3.2
93.0
93.0 tok/s
4
100%
#56
Copilot gpt-4.1
92.5
92.5 tok/s
4
100%
#57
Synthetic hf:Qwen/Qwen3-Coder-480B-A35B-Instruct
90.8
90.8 tok/s
4
100%
#58
Synthetic hf:deepseek-ai/DeepSeek-V3
88.0
88.0 tok/s
4
100%
#59
CrofAI kimi-k2.6-precision
80.7
80.7 tok/s
4
100%
#60
Synthetic hf:deepseek-ai/DeepSeek-V3.2
80.6
80.6 tok/s
4
100%
#61
z.ai glm-5-turbo
77.9
77.9 tok/s
3
100%
#62
CrofAI glm-5.1-precision
77.7
77.7 tok/s
4
100%
#63
Copilot gemini-2.5-pro
75.3
75.3 tok/s
4
100%
#64
z.ai glm-4.7
74.2
74.2 tok/s
3
100%
#65
Alibaba glm-4.7
74.1
74.1 tok/s
4
75%
#66
Copilot grok-code-fast-1
73.5
73.5 tok/s
4
100%
#67
Minimax MiniMax-M2
72.8
72.8 tok/s
4
100%
#68
CrofAI deepseek-v4-pro
70.0
70.0 tok/s
4
100%
#69
Copilot gpt-5.2
68.9
68.9 tok/s
4
100%
#70
Alibaba qwen3.5-plus
64.1
64.1 tok/s
4
75%
#71
Copilot gemini-3.1-pro-preview
62.4
62.4 tok/s
4
100%
#72
Alibaba MiniMax-M2.5
62.2
62.2 tok/s
4
75%
#73
OpenAI gpt-5.5-fast
61.0
61.0 tok/s
4
100%
#74
Alibaba qwen3-coder-plus
59.5
59.5 tok/s
4
75%
#75
Copilot claude-sonnet-4.5
59.2
59.2 tok/s
4
100%
#76
Claude claude-opus-4-7
58.3
58.3 tok/s
3
100%
#77
Claude claude-sonnet-4-6
56.7
56.7 tok/s
3
100%
#78
Alibaba qwen3.6-plus
53.4
53.4 tok/s
4
75%
#79
OpenAI gpt-5.2
53.2
53.2 tok/s
4
100%
#80
CrofAI qwen3.5-397b-a17b
52.7
52.7 tok/s
4
50%
#81
OpenAI gpt-5.3-codex
52.7
52.7 tok/s
4
100%
#82
OpenAI gpt-5.4-fast
52.4
52.4 tok/s
4
100%
#83
Copilot claude-sonnet-4.6
52.2
52.2 tok/s
4
100%
#84
OpenAI gpt-5.4
51.9
51.9 tok/s
4
100%
#85
Copilot claude-sonnet-4
51.2
51.2 tok/s
2
100%
#86
Claude claude-sonnet-4-5
51.1
51.1 tok/s
3
100%
#87
Claude claude-opus-4-5
50.9
50.9 tok/s
3
100%
#88
OpenAI gpt-5.5
50.5
50.5 tok/s
4
100%
#89
z.ai glm-5.1
49.9
49.9 tok/s
3
100%
#90
Minimax MiniMax-M2.7-highspeed
49.4
49.4 tok/s
4
100%
#91
Alibaba kimi-k2.5
48.8
48.8 tok/s
4
75%
#92
Minimax MiniMax-M2.5-highspeed
48.1
48.1 tok/s
4
100%
#93
Alibaba glm-5
40.6
40.6 tok/s
4
75%
#94
Alibaba qwen3-max-2026-01-23
32.1
32.1 tok/s
4
75%
#95
OpenRouter nvidia/nemotron-3-nano-omni-30b-a3b-reasoning:free
29.1
29.1 tok/s
1
100%
#96
Synthetic hf:nvidia/Kimi-K2.5-NVFP4
26.7
26.7 tok/s
4
50%
#97
Minimax MiniMax-M2.7
26.4
26.4 tok/s
4
100%
#98
Synthetic hf:moonshotai/Kimi-K2.5
25.6
25.6 tok/s
4
50%
#99
OpenRouter poolside/laguna-m.1:free
3.5
3.5 tok/s
1
100%

A note on the DeepSeek numbers: we're still investigating whether they're real. We've run more tests than the ones published here and they all land in the same range, so it's not a one-off measurement glitch. More info next week. Bars use a log scale so the smaller subscription numbers stay legible alongside Cerebras' genuinely-sustained 1,000–2,600 tok/s.

Speed over time

tok/s progression by model

Best output tok/s recorded per model and provider, per measurement date. Only models measured on 2+ dates are shown.

All plans, side by side

Subscription matrix

Every coding subscription tier we know about, with measured peak speed and current availability. The cheapest available plan on each provider is highlighted.

Cerebras

⚠ Waitlist only
  • Code Pro $50/mo~24M tokens/day
  • Code Max $200/mo~120M tokens/day
Peak1500.7 tok/s
WireOpenAI-compatible
Value— PAYG only
Heads up — waitlist only
Waitlist only. Code Pro and Code Max have not been re-opened to new customers — Cerebras has kept them on an indefinite waitlist since the initial rollout sold out. The PAYG free tier (1M tokens/day) and pay-per-token usage are unaffected.
  • Free / PAYG $0/moPay-per-token, no recurring charge
  • Hobby BEST VALUE$5/mo500 daily requests · access to all models
  • Pro $10/mo1,000 daily requests · priority support
  • Intermediate $20/mo2,500 daily requests
  • Scale $50/mo7,500 daily requests
  • Max $100/mo15,000 daily requests
Peak614.4 tok/s
WireOpenAI-compatible
Value122.9 / $

Minimax

⚠ Highspeed gating
  • Starter CHEAPEST$10/mo100 prompts / 5h, M2.5
  • Plus $20/mo300 prompts / 5h, M2.5
  • Max $50/mo1,000 prompts / 5h, M2.5
  • Plus-Highspeed $40/mo300 prompts / 5h, Lightning
  • Max-Highspeed $80/mo1,000 prompts / 5h, Lightning
  • Ultra-Highspeed $150/mo2,000 prompts / 5h, Lightning
Peak185.3 tok/s
WireAnthropic-compatible
Value18.5 / $
Heads up — the Lightning (high-speed, ~100 tok/s) variant of M2
The Lightning (high-speed, ~100 tok/s) variant of M2.5 is gated to the -Highspeed tiers only (Plus-Highspeed $40/mo, Max-Highspeed $80/mo, Ultra-Highspeed $150/mo). The cheaper Starter / Plus / Max plans give you regular M2.5 at ~50 tok/s — make sure you subscribe to a -Highspeed tier if you specifically need Lightning.

z.ai

⚠ Price hikes
  • Lite CHEAPEST$18/mo400 prompts / 5h, 2,000 / week
  • Pro $36/mo2,000 prompts / 5h, unlimited weekly
  • Max $96/moNo practical cap, peak-hour SLA
Peak100.9 tok/s
WireOpenAI-compatible · Anthropic-compatible
Value5.6 / $
Heads up — aggressive 2026 price hikes
Aggressive 2026 price hikes. The Lite plan launched in February 2026 around $3/mo and has been moved up several times since — currently $18/mo (≈$30/quarter). That puts it within a few dollars of Claude Pro ($20/mo). Marketing still claims '3× Claude Pro usage', but that figure is vendor-supplied and based on z.ai's own quota model, not an apples-to-apples measurement. Verify the latest pricing on z.ai/subscribe before subscribing.
  • Plus CHEAPEST$20/moStandard ChatGPT + Codex access
  • Pro $200/moHigher quotas + research tier
Peak146.8 tok/s
WireCodex OAuth (SSE)
Value7.3 / $
  • Pro CHEAPEST$20/moClaude Code with shared Pro limits
  • Max $100/mo5× Pro quota
  • Max+ $200/mo20× Pro quota + Guest Passes
Peak119.8 tok/s
WireClaude Code OAuth (Anthropic /v1/messages + oauth-2025-04-20)
Value6.0 / $

Copilot

⚠ Pro signups paused
  • Free $0/mo2,000 completions / 50 premium requests per month
  • Pro $10/moHigher limits, Sonnet/GPT-5 (no Opus)
  • Pro+ $39/moIncludes Claude Opus 4.7 + premium models
  • Business CHEAPEST$19/moPer seat — admin controls, audit logs
  • Enterprise $39/moPer seat (+ $21 GH Enterprise Cloud)
Peak113.3 tok/s
WireCopilot Bearer (OpenAI-compat /chat/completions)
Value6.0 / $
Heads up — gitHub paused <strong>new sign-ups</strong> for the Pro, Pro+ and Student tiers on 2026-04-20 — citing that agentic workloads consume far more compute than the original pricing assumed
GitHub paused new sign-ups for the Pro, Pro+ and Student tiers on 2026-04-20 — citing that agentic workloads consume far more compute than the original pricing assumed. Existing Pro/Pro+ subscribers keep their plan; new individual users can only pick Free, Business or Enterprise. Opus 4.x has also been removed from Pro — only Pro+ keeps it.

Alibaba

⚠ Interactive use only (Token Plan)
  • Token Plan · Standard Seat CHEAPEST$30/mo25,000 credits/mo · text, vision, image gen — interactive use only
  • Token Plan · Pro Seat $100/mo100,000 credits/mo (4× Standard) — Alibaba's recommended tier
  • Token Plan · Max Seat $200/mo250,000 credits/mo (10× Standard)
  • Coding Plan · Lite (legacy) $10/mo18,000 requests/mo — closed to new subs since 2026-03-20
  • Coding Plan · Pro (legacy) $50/mo90,000 requests/mo — effectively impossible to buy
Peak146.6 tok/s
WireOpenAI-compatible
Value4.9 / $
Heads up — <strong>Token Plan ToS restrict use to interactive AI coding/agent tools only
Token Plan ToS restrict use to interactive AI coding/agent tools only. Alibaba's FAQ forbids automated scripts, application backends, benchmarking and research scripts, with API key revocation as the stated penalty. Long agentic sessions inside a compatible tool (Claude Code, opencode, Qwen Code) with a human supervising are still allowed — the line is "is there a person driving a compatible tool?", not how long the session lasts. Our weekly bench against Alibaba runs against a legacy Coding Plan subscription where this clause didn't exist.
  • Free tier $0/mo~50 `:free` requests/day (rises to ~1000/day with $10+ credit)
  • Pay-as-you-go $0/moPer-token pricing on non-free models · deposit any amount
Peak4177.4 tok/s
WireOpenAI-compatible
Value— PAYG only
  • Subscription CHEAPEST$30/mo500 messages / 5h · all models included · 1 concurrent req/model
  • Usage-based $0/moPay-per-token · all models
Peak224.2 tok/s
WireOpenAI-compatible
Value7.5 / $

PAYG-only providers

No subscription tier — pay per token. Fast for sporadic use, harder to budget for daily agent loops.

DeepSeek

Chinese AI lab known for open-weight MoE models that punch above their price.

7394.8
Peak tok/s
How to read this

What we measured (and what we didn't)

Throughput is end-to-end — output tokens divided by wall-clock time, including network and queueing. Quality is not measured — that needs evals, not stopwatches. Tail latency is not measured — averages dominate; p99 would need many more samples. Same prompt, every model the provider exposes, weekly refresh. Read the full methodology →