CrofAI Hobby
Aggregator that resells open-source models behind one cheap key. Cheapest available plan at $5/mo (Hobby); peak 614.4 tok/s on kimi-k2.5-lightning.
Real-world throughput for every model on every coding-agent subscription we could get our hands on. Same prompt, same client, every model the provider exposes. Updated weekly.
The fastest available subscription tier in each price bracket — based on the peak tok/s we measured on each provider's plan. Buckets with no available subscription are skipped.
Aggregator that resells open-source models behind one cheap key. Cheapest available plan at $5/mo (Hobby); peak 614.4 tok/s on kimi-k2.5-lightning.
Run open-source AI models privately — flat $30/mo or pay-per-token. Cheapest available plan at $30/mo (Subscription); peak 224.2 tok/s on hf:moonshotai/Kimi-K2.6.
Three things landed this week.
OpenRouter as a new front door. 187 models in their catalogue, 25 of them :free text models. With $10+ in credit you cross into the higher daily quota tier (~1000 free-model requests/day), making them viable for both speed and quality benches. Two surprises in the speed pass: nvidia/nemotron-3-nano-30b-a3b:free clocked 4177 tok/s and poolside/laguna-xs.2:free hit 3746 tok/s — both in Cerebras territory, unusual for OpenRouter-routed inference.
Catalogue refresh. Most providers shipped new SKUs since W18: Claude added Sonnet 4.5; Copilot opened up the gpt-5.x family plus grok-code-fast-1; Alibaba's Qwen3.6 lineup is in; Cerebras now serves zai-glm-4.7; CrofAI gained five Kimi and qwen3.5 variants; OpenAI's gpt-5.5 family and several gpt-5.4 variants are tested for the first time.
Alibaba opened a second door — the Token Plan (Team Edition). After months of the legacy Coding Plan being effectively unobtainable, Alibaba shipped a parallel Token Plan that you can actually buy. Three tiers per seat: Standard $30/mo (25k Credits), Pro $100/mo (100k Credits, their recommended tier), Max $200/mo (250k Credits). Floor is cheaper than the legacy $50 Pro, but the closest equivalent is now $100 — a 2× hike at the volume most users actually want.
Credits vs requests — what really changed. The legacy Coding Plan billed by request count (one HTTP call to the endpoint). A single user query in an agent like opencode or Claude Code typically expands into 5-30 requests internally — planning, tool calls, re-prompts — so 90k requests/mo at $50 actually meant a few thousand user-visible queries. The Token Plan bills by Credits derived from input + cached + output tokens, modulated by model, thinking mode and tool calls. Alibaba doesn't publish the credit-to-token ratio, so budgeting precisely is impossible without measuring your own workload first. Practical effect: request-based subsidised thinking-mode and long-output models (they cost the same as a one-shot reply); credit-based makes you pay for those tokens proportionally. Short verbose-tool-using agents are better off on Credits; long-context, deep-reasoning, terse-output workloads were better off on the old request quota — which you can't buy anymore.
Catch: the Token Plan ToS restrict it to interactive use with compatible AI coding/agent tools only. What the FAQ explicitly forbids: automated scripts, application backends, benchmarking and research scripts — violations may trigger API key revocation. What stays allowed: long agentic sessions inside a compatible tool (Claude Code, opencode, Qwen Code, etc.) with a human in the loop, even if the agent itself runs autonomously for hours. The line is "is there a person driving a compatible tool?", not session duration. Our weekly bench against Alibaba runs against a legacy Coding Plan subscription where this clause didn't exist; new subscribers on the Token Plan should keep usage inside one of those compatible agents.
A note on :free quotas. OpenRouter's free tier resets daily at midnight UTC, but the per-model bottleneck is usually the upstream provider, not OpenRouter itself. Several Meta-llama and Mistral-derived variants returned HTTP 429s all week and never recovered across 11 retry attempts — those are upstream pool limits, not your account's. Cross-provider drift comparisons (openai/gpt-oss-120b OpenRouter vs Synthetic; z-ai/glm-4.5-air OpenRouter vs zai) are now possible and arguably the real new value here.
Peak tok/s ÷ cheapest available plan price. Pay-as-you-go-only providers are excluded — speed-per-subscription-dollar is undefined for them.
Output tokens per wall-clock second. Filter by provider, change the sort, or expand to all measured models.
A note on the DeepSeek numbers: we're still investigating whether they're real. We've run more tests than the ones published here and they all land in the same range, so it's not a one-off measurement glitch. More info next week. Bars use a log scale so the smaller subscription numbers stay legible alongside Cerebras' genuinely-sustained 1,000–2,600 tok/s.
Best output tok/s recorded per model and provider, per measurement date. Only models measured on 2+ dates are shown.
Every coding subscription tier we know about, with measured peak speed and current availability. The cheapest available plan on each provider is highlighted.
No subscription tier — pay per token. Fast for sporadic use, harder to budget for daily agent loops.
Chinese AI lab known for open-weight MoE models that punch above their price.