About this leaderboard

How the numbers are produced and what they mean.

Methodology

Every benchmark session sends the same prompt to every model a provider exposes and records latency and output tokens per second. The site refreshes weekly.

Why we don't compare list prices

Most providers covered here sell flat-rate coding subscriptions (Claude Code, ChatGPT/Codex, GLM Coding Plan, MiniMax Token Plan, Alibaba Coding Plan). Comparing per-token list prices across pay-as-you-go and subscription tiers is misleading — the real cost is the monthly fee, and what you get back depends on your usage pattern. We focus on throughput, reliability, and speed-per-subscription-dollar.

What we measure

Why provider matters

The same model is often available on multiple providers. Speed depends on the provider's hardware (Cerebras WSE-3 vs commodity GPUs), routing, and congestion. We always tag each row as provider · model so you can tell apart, say, GLM-4.7 on Cerebras (≈1,000+ tok/s) from GLM-4.7 on z.ai (subscription throughput).

Caveats and disclaimers

Acknowledgements

Part of the data behind this leaderboard is provided by forkline.dev — thanks for the access.