Which LLM coding subscription is actually fast?

Real-world throughput for every model on every coding-agent subscription we could get our hands on. Same prompt, same client, every model the provider exposes. Updated weekly.

11 providers 95 models on subscriptions 320 successful runs 409,550 output tokens measured

7,394

Peak tok/s · DeepSeek deepseek-v4-flash

122.9

Best tok/s per $/mo · CrofAI Hobby

Cheapest available coding sub

5 / 11

Providers with availability caveats

★ Editor's pick · 2026-W19

Millaguie's pick of the week

CrofAI Minimax DeepSeek

CrofAI takes the top spot. Flat pricing, no promotional windows to time, and an eighteen-model catalogue that grew again this week (all the Kimi K2 variants plus the GLM 5.x family are in). Two ways to play it. CrofAI + MiniMax if you want absolute strongest pair-for-the-price: lean on MiniMax for the heavy, hairy problems where raw capability matters, fall back to CrofAI for everything else and you get latency that beats most official providers. CrofAI alone on a larger plan if you'd rather skip the second invoice — the lightweight models (kimi-k2.5-lightning, glm-4.7-flash) absorb the bulk-throughput slot at speeds the bigger official endpoints don't match, and the precision variants are there when you need them. And if your use is genuinely occasional — a query here, a script there — DeepSeek pay-as-you-go wins outright: $0.14/$0.28 per Mtok across the chat/reasoner lineup is so cheap that committing to any plan would be over-engineering.

▸ Previous picks

2026-W18DeepSeek V4-pro + MiniMax. DeepSeek ran a 75 % launch discount and MiniMax's pricing was already competitive — best cost-per-quality pairing on the board until the discount expired.

2026-W17DeepSeek + MiniMax for the win. Even on pay-as-you-go, DeepSeek's pricing is low enough to use it as a reasoning escape hatch when MiniMax gets stuck in a rabbit hole — without committing to a premium subscription.

Weekly notes · 2026-W19

Catalog grew, OpenRouter joined the party

Three things landed this week.

OpenRouter as a new front door. 187 models in their catalogue, 25 of them :free text models. With $10+ in credit you cross into the higher daily quota tier (~1000 free-model requests/day), making them viable for both speed and quality benches. Two surprises in the speed pass: nvidia/nemotron-3-nano-30b-a3b:free clocked 4177 tok/s and poolside/laguna-xs.2:free hit 3746 tok/s — both in Cerebras territory, unusual for OpenRouter-routed inference.

Catalogue refresh. Most providers shipped new SKUs since W18: Claude added Sonnet 4.5; Copilot opened up the gpt-5.x family plus grok-code-fast-1; Alibaba's Qwen3.6 lineup is in; Cerebras now serves zai-glm-4.7; CrofAI gained five Kimi and qwen3.5 variants; OpenAI's gpt-5.5 family and several gpt-5.4 variants are tested for the first time.

Alibaba opened a second door — the Token Plan (Team Edition). After months of the legacy Coding Plan being effectively unobtainable, Alibaba shipped a parallel Token Plan that you can actually buy. Three tiers per seat: Standard $30/mo (25k Credits), Pro $100/mo (100k Credits, their recommended tier), Max $200/mo (250k Credits). Floor is cheaper than the legacy $50 Pro, but the closest equivalent is now $100 — a 2× hike at the volume most users actually want.

Credits vs requests — what really changed. The legacy Coding Plan billed by request count (one HTTP call to the endpoint). A single user query in an agent like opencode or Claude Code typically expands into 5-30 requests internally — planning, tool calls, re-prompts — so 90k requests/mo at $50 actually meant a few thousand user-visible queries. The Token Plan bills by Credits derived from input + cached + output tokens, modulated by model, thinking mode and tool calls. Alibaba doesn't publish the credit-to-token ratio, so budgeting precisely is impossible without measuring your own workload first. Practical effect: request-based subsidised thinking-mode and long-output models (they cost the same as a one-shot reply); credit-based makes you pay for those tokens proportionally. Short verbose-tool-using agents are better off on Credits; long-context, deep-reasoning, terse-output workloads were better off on the old request quota — which you can't buy anymore.

Catch: the Token Plan ToS restrict it to interactive use with compatible AI coding/agent tools only. What the FAQ explicitly forbids: automated scripts, application backends, benchmarking and research scripts — violations may trigger API key revocation. What stays allowed: long agentic sessions inside a compatible tool (Claude Code, opencode, Qwen Code, etc.) with a human in the loop, even if the agent itself runs autonomously for hours. The line is "is there a person driving a compatible tool?", not session duration. Our weekly bench against Alibaba runs against a legacy Coding Plan subscription where this clause didn't exist; new subscribers on the Token Plan should keep usage inside one of those compatible agents.

A note on :free quotas. OpenRouter's free tier resets daily at midnight UTC, but the per-model bottleneck is usually the upstream provider, not OpenRouter itself. Several Meta-llama and Mistral-derived variants returned HTTP 429s all week and never recovered across 11 retry attempts — those are upstream pool limits, not your account's. Cross-provider drift comparisons (openai/gpt-oss-120b OpenRouter vs Synthetic; z-ai/glm-4.5-air OpenRouter vs zai) are now possible and arguably the real new value here.

Speed per dollar

Best value subscriptions

Peak tok/s ÷ cheapest available plan price. Pay-as-you-go-only providers are excluded — speed-per-subscription-dollar is undefined for them.

Provider · best model on plan

Peak

Plan

tok/s per $/mo

CrofAI · kimi-k2.5-lightning

614.4

$5Hobby

122.9

View →

Minimax · MiniMax-M2.5

185.3

$10Starter

18.5

View →

Synthetic · hf:moonshotai/Kimi-K2.6

224.2

$30Subscription

7.5

View →

OpenAI · gpt-5.4-mini

146.8

$20Plus

7.3

View →

Claude · claude-haiku-4-5-20251001

119.8

$20Pro

6.0

View →

Copilot · gpt-5-mini

113.3

$19Business

6.0

View →

z.ai · glm-4.5-air

100.9

$18Lite

5.6

View →

Alibaba · qwen3-coder-next

146.6

$30Token Plan · Standard Seat

4.9

View →

Pure throughput

Models by tok/s

Output tokens per wall-clock second. Filter by provider, change the sort, or expand to all measured models.

Provider · model

Best tok/s

Throughput

Runs

DeepSeek deepseek-v4-flash

7394.8

7394.8 tok/s

100%

DeepSeek deepseek-chat

7329.9

7329.9 tok/s

100%

DeepSeek deepseek-v4-pro

7056.0

7056.0 tok/s

100%

DeepSeek deepseek-reasoner

6136.8

6136.8 tok/s

100%

OpenRouter nvidia/nemotron-3-nano-30b-a3b:free

4177.4

4177.4 tok/s

100%

OpenRouter poolside/laguna-xs.2:free

3746.4

3746.4 tok/s

100%

OpenRouter liquid/lfm-2.5-1.2b-thinking:free

2416.9

2416.9 tok/s

100%

OpenRouter openai/gpt-oss-20b:free

1737.2

1737.2 tok/s

100%

Cerebras llama3.1-8b

1500.7

1500.7 tok/s

100%

#10

OpenRouter liquid/lfm-2.5-1.2b-instruct:free

1115.9

1115.9 tok/s

100%

#11

OpenRouter nvidia/nemotron-nano-9b-v2:free

1018.1

1018.1 tok/s

100%

#12

OpenRouter nvidia/nemotron-nano-12b-v2-vl:free

946.8

946.8 tok/s

100%

#13

OpenRouter google/gemma-4-31b-it:free

945.2

945.2 tok/s

50%

#14

OpenRouter minimax/minimax-m2.5:free

857.1

857.1 tok/s

50%

#15

Cerebras qwen-3-235b-a22b-instruct-2507

794.2

794.2 tok/s

75%

#16

CrofAI kimi-k2.5-lightning

614.4

614.4 tok/s

100%

#17

OpenRouter openai/gpt-oss-120b:free

332.9

332.9 tok/s

100%

#18

OpenRouter google/gemma-4-26b-a4b-it:free

284.5

284.5 tok/s

33%

#19

Synthetic hf:moonshotai/Kimi-K2.6

224.2

224.2 tok/s

100%

#20

Synthetic hf:Qwen/Qwen3.5-397B-A17B

190.7

190.7 tok/s

100%

#21

OpenRouter z-ai/glm-4.5-air:free

187.8

187.8 tok/s

100%

#22

Minimax MiniMax-M2.5

185.3

185.3 tok/s

100%

#23

CrofAI qwen3.5-9b

174.7

174.7 tok/s

100%

#24

Synthetic hf:zai-org/GLM-4.7

173.7

173.7 tok/s

100%

#25

CrofAI qwen3.6-27b

167.3

167.3 tok/s

100%

#26

Minimax MiniMax-M2.1

167.0

167.0 tok/s

100%

#27

Synthetic hf:zai-org/GLM-5.1

164.1

164.1 tok/s

100%

#28

CrofAI qwen3.5-9b-chat

158.0

158.0 tok/s

100%

#29

Synthetic hf:zai-org/GLM-4.7-Flash

150.3

150.3 tok/s

100%

#30

OpenAI gpt-5.4-mini

146.8

146.8 tok/s

100%

#31

Alibaba qwen3-coder-next

146.6

146.6 tok/s

75%

#32

Synthetic hf:nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4

142.2

142.2 tok/s

100%

#33

OpenAI gpt-5.4-mini-fast

141.2

141.2 tok/s

100%

#34

Synthetic hf:deepseek-ai/DeepSeek-R1-0528

135.2

135.2 tok/s

100%

#35

Synthetic hf:openai/gpt-oss-120b

132.6

132.6 tok/s

100%

#36

Synthetic hf:deepseek-ai/DeepSeek-R1

131.2

131.2 tok/s

100%

#37

Synthetic hf:MiniMaxAI/MiniMax-M2.5

129.5

129.5 tok/s

100%

#38

Claude claude-haiku-4-5-20251001

119.8

119.8 tok/s

100%

#39

CrofAI kimi-k2.5

115.1

115.1 tok/s

100%

#40

Copilot gpt-5-mini

113.3

113.3 tok/s

100%

#41

CrofAI glm-4.7-flash

113.2

113.2 tok/s

50%

#42

Synthetic hf:zai-org/GLM-5

110.8

110.8 tok/s

100%

#43

CrofAI glm-5.1

108.0

108.0 tok/s

100%

#44

Copilot gemini-3-flash-preview

107.9

107.9 tok/s

100%

#45

CrofAI kimi-k2.6

105.8

105.8 tok/s

100%

#46

CrofAI glm-5

105.1

105.1 tok/s

50%

#47

CrofAI gemma-4-31b-it

104.9

104.9 tok/s

100%

#48

Copilot claude-haiku-4.5

104.8

104.8 tok/s

100%

#49

CrofAI greg

103.8

103.8 tok/s

50%

#50

CrofAI minimax-m2.5

102.8

102.8 tok/s

100%

#51

z.ai glm-4.5-air

100.9

100.9 tok/s

100%

#52

CrofAI glm-4.7

99.5

99.5 tok/s

100%

#53

Synthetic hf:meta-llama/Llama-3.3-70B-Instruct

93.9

93.9 tok/s

100%

#54

Copilot gpt-4o

93.1

93.1 tok/s

100%

#55

CrofAI deepseek-v3.2

93.0

93.0 tok/s

100%

#56

Copilot gpt-4.1

92.5

92.5 tok/s

100%

#57

Synthetic hf:Qwen/Qwen3-Coder-480B-A35B-Instruct

90.8

90.8 tok/s

100%

#58

Synthetic hf:deepseek-ai/DeepSeek-V3

88.0

88.0 tok/s

100%

#59

CrofAI kimi-k2.6-precision

80.7

80.7 tok/s

100%

#60

Synthetic hf:deepseek-ai/DeepSeek-V3.2

80.6

80.6 tok/s

100%

#61

z.ai glm-5-turbo

77.9

77.9 tok/s

100%

#62

CrofAI glm-5.1-precision

77.7

77.7 tok/s

100%

#63

Copilot gemini-2.5-pro

75.3

75.3 tok/s

100%

#64

z.ai glm-4.7

74.2

74.2 tok/s

100%

#65

Alibaba glm-4.7

74.1

74.1 tok/s

75%

#66

Copilot grok-code-fast-1

73.5

73.5 tok/s

100%

#67

Minimax MiniMax-M2

72.8

72.8 tok/s

100%

#68

CrofAI deepseek-v4-pro

70.0

70.0 tok/s

100%

#69

Copilot gpt-5.2

68.9

68.9 tok/s

100%

#70

Alibaba qwen3.5-plus

64.1

64.1 tok/s

75%

#71

Copilot gemini-3.1-pro-preview

62.4

62.4 tok/s

100%

#72

Alibaba MiniMax-M2.5

62.2

62.2 tok/s

75%

#73

OpenAI gpt-5.5-fast

61.0

61.0 tok/s

100%

#74

Alibaba qwen3-coder-plus

59.5

59.5 tok/s

75%

#75

Copilot claude-sonnet-4.5

59.2

59.2 tok/s

100%

#76

Claude claude-opus-4-7

58.3

58.3 tok/s

100%

#77

Claude claude-sonnet-4-6

56.7

56.7 tok/s

100%

#78

Alibaba qwen3.6-plus

53.4

53.4 tok/s

75%

#79

OpenAI gpt-5.2

53.2

53.2 tok/s

100%

#80

CrofAI qwen3.5-397b-a17b

52.7

52.7 tok/s

50%

#81

OpenAI gpt-5.3-codex

52.7

52.7 tok/s

100%

#82

OpenAI gpt-5.4-fast

52.4

52.4 tok/s

100%

#83

Copilot claude-sonnet-4.6

52.2

52.2 tok/s

100%

#84

OpenAI gpt-5.4

51.9

51.9 tok/s

100%

#85

Copilot claude-sonnet-4

51.2

51.2 tok/s

100%

#86

Claude claude-sonnet-4-5

51.1

51.1 tok/s

100%

#87

Claude claude-opus-4-5

50.9

50.9 tok/s

100%

#88

OpenAI gpt-5.5

50.5

50.5 tok/s

100%

#89

z.ai glm-5.1

49.9

49.9 tok/s

100%

#90

Minimax MiniMax-M2.7-highspeed

49.4

49.4 tok/s

100%

#91

Alibaba kimi-k2.5

48.8

48.8 tok/s

75%

#92

Minimax MiniMax-M2.5-highspeed

48.1

48.1 tok/s

100%

#93

Alibaba glm-5

40.6

40.6 tok/s

75%

#94

Alibaba qwen3-max-2026-01-23

32.1

32.1 tok/s

75%

#95

OpenRouter nvidia/nemotron-3-nano-omni-30b-a3b-reasoning:free

29.1

29.1 tok/s

100%

#96

Synthetic hf:nvidia/Kimi-K2.5-NVFP4

26.7

26.7 tok/s

50%

#97

Minimax MiniMax-M2.7

26.4

26.4 tok/s

100%

#98

Synthetic hf:moonshotai/Kimi-K2.5

25.6

25.6 tok/s

50%

#99

OpenRouter poolside/laguna-m.1:free

3.5

3.5 tok/s

100%

A note on the DeepSeek numbers: we're still investigating whether they're real. We've run more tests than the ones published here and they all land in the same range, so it's not a one-off measurement glitch. More info next week. Bars use a log scale so the smaller subscription numbers stay legible alongside Cerebras' genuinely-sustained 1,000–2,600 tok/s.

Speed over time

tok/s progression by model

Best output tok/s recorded per model and provider, per measurement date. Only models measured on 2+ dates are shown.

All plans, side by side

Subscription matrix

Every coding subscription tier we know about, with measured peak speed and current availability. The cheapest available plan on each provider is highlighted.

Hide closed / sold out

Cerebras

⚠ Waitlist only

Code Pro $50/mo~24M tokens/day
Code Max $200/mo~120M tokens/day

Peak1500.7 tok/s

WireOpenAI-compatible

Value— PAYG only

Heads up — waitlist only

Waitlist only. Code Pro and Code Max have not been re-opened to new customers — Cerebras has kept them on an indefinite waitlist since the initial rollout sold out. The PAYG free tier (1M tokens/day) and pay-per-token usage are unaffected.

CrofAI

Free / PAYG $0/moPay-per-token, no recurring charge
Hobby BEST VALUE$5/mo500 daily requests · access to all models
Pro $10/mo1,000 daily requests · priority support
Intermediate $20/mo2,500 daily requests
Scale $50/mo7,500 daily requests
Max $100/mo15,000 daily requests

Peak614.4 tok/s

WireOpenAI-compatible

Value122.9 / $

Minimax

⚠ Highspeed gating

Starter CHEAPEST$10/mo100 prompts / 5h, M2.5
Plus $20/mo300 prompts / 5h, M2.5
Max $50/mo1,000 prompts / 5h, M2.5
Plus-Highspeed $40/mo300 prompts / 5h, Lightning
Max-Highspeed $80/mo1,000 prompts / 5h, Lightning
Ultra-Highspeed $150/mo2,000 prompts / 5h, Lightning

Peak185.3 tok/s

WireAnthropic-compatible

Value18.5 / $

Heads up — the Lightning (high-speed, ~100 tok/s) variant of M2

The Lightning (high-speed, ~100 tok/s) variant of M2.5 is gated to the -Highspeed tiers only (Plus-Highspeed $40/mo, Max-Highspeed $80/mo, Ultra-Highspeed $150/mo). The cheaper Starter / Plus / Max plans give you regular M2.5 at ~50 tok/s — make sure you subscribe to a -Highspeed tier if you specifically need Lightning.

z.ai

⚠ Price hikes

Lite CHEAPEST$18/mo400 prompts / 5h, 2,000 / week
Pro $36/mo2,000 prompts / 5h, unlimited weekly
Max $96/moNo practical cap, peak-hour SLA

Peak100.9 tok/s

WireOpenAI-compatible · Anthropic-compatible

Value5.6 / $

Heads up — aggressive 2026 price hikes

Aggressive 2026 price hikes. The Lite plan launched in February 2026 around $3/mo and has been moved up several times since — currently $18/mo (≈$30/quarter). That puts it within a few dollars of Claude Pro ($20/mo). Marketing still claims '3× Claude Pro usage', but that figure is vendor-supplied and based on z.ai's own quota model, not an apples-to-apples measurement. Verify the latest pricing on z.ai/subscribe before subscribing.

OpenAI

Plus CHEAPEST$20/moStandard ChatGPT + Codex access
Pro $200/moHigher quotas + research tier

Peak146.8 tok/s

WireCodex OAuth (SSE)

Value7.3 / $

Claude

Pro CHEAPEST$20/moClaude Code with shared Pro limits
Max $100/mo5× Pro quota
Max+ $200/mo20× Pro quota + Guest Passes

Peak119.8 tok/s

WireClaude Code OAuth (Anthropic /v1/messages + oauth-2025-04-20)

Value6.0 / $

Copilot

⚠ Pro signups paused

Free $0/mo2,000 completions / 50 premium requests per month
Pro $10/moHigher limits, Sonnet/GPT-5 (no Opus)
Pro+ $39/moIncludes Claude Opus 4.7 + premium models
Business CHEAPEST$19/moPer seat — admin controls, audit logs
Enterprise $39/moPer seat (+ $21 GH Enterprise Cloud)

Peak113.3 tok/s

WireCopilot Bearer (OpenAI-compat /chat/completions)

Value6.0 / $

Heads up — gitHub paused <strong>new sign-ups</strong> for the Pro, Pro+ and Student tiers on 2026-04-20 — citing that agentic workloads consume far more compute than the original pricing assumed

GitHub paused new sign-ups for the Pro, Pro+ and Student tiers on 2026-04-20 — citing that agentic workloads consume far more compute than the original pricing assumed. Existing Pro/Pro+ subscribers keep their plan; new individual users can only pick Free, Business or Enterprise. Opus 4.x has also been removed from Pro — only Pro+ keeps it.

Alibaba

⚠ Interactive use only (Token Plan)

Token Plan · Standard Seat CHEAPEST$30/mo25,000 credits/mo · text, vision, image gen — interactive use only
Token Plan · Pro Seat $100/mo100,000 credits/mo (4× Standard) — Alibaba's recommended tier
Token Plan · Max Seat $200/mo250,000 credits/mo (10× Standard)
Coding Plan · Lite (legacy) $10/mo18,000 requests/mo — closed to new subs since 2026-03-20
Coding Plan · Pro (legacy) $50/mo90,000 requests/mo — effectively impossible to buy

Peak146.6 tok/s

WireOpenAI-compatible

Value4.9 / $

Heads up — <strong>Token Plan ToS restrict use to interactive AI coding/agent tools only

Token Plan ToS restrict use to interactive AI coding/agent tools only. Alibaba's FAQ forbids automated scripts, application backends, benchmarking and research scripts, with API key revocation as the stated penalty. Long agentic sessions inside a compatible tool (Claude Code, opencode, Qwen Code) with a human supervising are still allowed — the line is "is there a person driving a compatible tool?", not how long the session lasts. Our weekly bench against Alibaba runs against a legacy Coding Plan subscription where this clause didn't exist.

OpenRouter

Free tier $0/mo~50 `:free` requests/day (rises to ~1000/day with $10+ credit)
Pay-as-you-go $0/moPer-token pricing on non-free models · deposit any amount

Peak4177.4 tok/s

WireOpenAI-compatible

Value— PAYG only

Synthetic

Subscription CHEAPEST$30/mo500 messages / 5h · all models included · 1 concurrent req/model
Usage-based $0/moPay-per-token · all models

Peak224.2 tok/s

WireOpenAI-compatible

Value7.5 / $

PAYG-only providers

No subscription tier — pay per token. Fast for sporadic use, harder to budget for daily agent loops.

DeepSeek

Chinese AI lab known for open-weight MoE models that punch above their price.

7394.8

Peak tok/s

How to read this

Which LLM coding subscription is actually fast?

Two picks by budget

CrofAI Hobby

Synthetic Subscription

Millaguie's pick of the week

Catalog grew, OpenRouter joined the party

Best value subscriptions

Models by tok/s

tok/s progression by model

Subscription matrix

PAYG-only providers

What we measured (and what we didn't)