About this leaderboard
How the numbers are produced and what they mean.
Methodology
Every benchmark session sends the same prompt to every model a provider exposes and records latency and output tokens per second. The site refreshes weekly.
Why we don't compare list prices
Most providers covered here sell flat-rate coding subscriptions (Claude Code, ChatGPT/Codex, GLM Coding Plan, MiniMax Token Plan, Alibaba Coding Plan). Comparing per-token list prices across pay-as-you-go and subscription tiers is misleading — the real cost is the monthly fee, and what you get back depends on your usage pattern. We focus on throughput, reliability, and speed-per-subscription-dollar.
What we measure
- Throughput (tok/s) — output tokens divided by wall-clock time. Network and queueing both count.
- Best vs avg tok/s — peak observed and mean across all successful runs.
- Success rate — fraction of runs that completed without HTTP / parsing errors.
- Output length — chars in the longest successful response.
- Speed per $/mo — peak tok/s ÷ cheapest available plan price for each provider.
Why provider matters
The same model is often available on multiple providers. Speed depends on the provider's hardware (Cerebras WSE-3 vs commodity GPUs), routing, and congestion. We always tag each row as provider · model so you can tell apart, say, GLM-4.7 on Cerebras (≈1,000+ tok/s) from GLM-4.7 on z.ai (subscription throughput).
Caveats and disclaimers
- Quality is not measured. A fast wrong answer is still wrong.
- Tail latency is not measured. Averages dominate; p99 numbers would need many more samples per model.
- Time-to-first-token is not measured. We capture end-to-end latency only — streaming UX may feel different.
- Suspiciously high tok/s (above ~3,000) on short prompts often indicates cache hits or warmed routes, not sustained agentic performance.
- Subscription "value" uses the cheapest available plan. A $10 plan with 100 prompts/5h beats a $50 plan only if that quota fits your workload.
Acknowledgements
Part of the data behind this leaderboard is provided by forkline.dev — thanks for the access.