Which model gives the best value?
DeepSeek: DeepSeek V4 Flash has the strongest useful intelligence per blended dollar today.
Pick the right model for each task. win.sh scores AI models by benchmark quality, coding strength, reasoning strength, speed, and price so your app stops overpaying for easy work.
Start here: discovery, specs, model data, raw routes, and the skill file.
Route by task
Best model for write code?
DeepSeek: DeepSeek V4 Flash has the strongest useful intelligence per blended dollar today.
Coming soon
Instant answers
DeepSeek: DeepSeek V4 Flash has the strongest useful intelligence per blended dollar today.
Ministral 3 3B has the lowest useful blended price at $0.1/1M tokens.
Ministral 3 3B has the highest estimated generation speed at 320 tokens/sec.
OpenAI: GPT-5.5 Pro has the highest blended intelligence score in today's index.
Compact leaderboard
| Model | Best for | Intel | Code | Reason | $/1M | Speed | Value |
|---|---|---|---|---|---|---|---|
DeepSeek: DeepSeek V4 FlashDeepseek | 72 | 73 | 71 | $0.117 | 210/s | 1675.2 | |
DeepSeek: DeepSeek V4 ProDeepseek | 88 | 89 | 88 | $0.566 | 78/s | 1590.1 | |
Google: Gemini 3.1 Flash LiteGoogle | 76 | 72 | 75 | $0.625 | 168/s | 518.4 | |
Z.ai: GLM 5.2Z.ai | 86 | 84 | 86 | $1.56 | 92/s | 503.2 | |
OpenAI: GPT-5.4 NanoOpenAI | 74 | 70 | 73 | $0.515 | 154/s | 497.1 | |
Qwen: Qwen3.6 FlashQwen | 73 | 70 | 72 | $0.469 | 190/s | 479.7 | |
MoonshotAI: Kimi K2.7 CodeMoonshotai | 85 | 92 | 84 | $1.57 | 74/s | 464.9 | |
Qwen: Qwen3.7 MaxQwen | 87 | 86 | 87 | $2.00 | 86/s | 420.5 | |
Mistral Medium 3.5Mistral | 82 | 80 | 82 | $1.90 | 104/s | 303.2 | |
Google: Gemini 3.1 Pro PreviewGoogle | 92 | 90 | 92 | $5.00 | 64/s | 231.2 | |
Anthropic: Claude Sonnet 4.6Anthropic | 94 | 95 | 93 | $6.60 | 52/s | 196.4 | |
Anthropic: Claude Opus 4.8Anthropic | 99 | 96 | 99 | $11.00 | 28/s | 152.8 | |
OpenAI: GPT-5.5OpenAI | 96 | 94 | 96 | $12.50 | 44/s | 115.5 | |
OpenAI: GPT-5.5 ProOpenAI | 100 | 97 | 100 | $75.00 | 24/s | 23.5 | |
Ministral 3 3BMistral | 58 | 52 | 56 | $0.100 | 320/s | 10 |
Method
Each verified model has an intelligence score plus separate coding and reasoning scores. Configured benchmark feeds can extend the seed table.
Prices are normalized into a blended dollars per million tokens number, then compared against the useful quality floor.
Code favors coding score. Planning and hard analysis favor reasoning. Summaries and extraction favor cheap reliable execution.
Every route returns a backup model so applications can retry without sending the same task back through guesswork.
Index updated Jun 29, 2026.
Normalize model price, speed, context, and benchmark-style scores into one comparable table.
Calculate value as useful intelligence per blended dollar after filtering out weak models.
Apply the task, latency, budget, and quality settings from the request.
Return the top model, fallback model, policy, and plain-English reason.
FAQ
An AI model router chooses the best model for a task by weighing benchmark quality, coding or reasoning strength, speed, context, and token price.
Yes. The public GET endpoints for route recommendations, raw model ids, category winners, OpenAPI, llms.txt, and the model index are free to read.
Use /llms.txt for discovery, /openapi.json for the contract, /router/models for the full table, and /router/raw when the agent only needs a model id.
Yes. Add providers=anthropic,openai or regions=us,eu,china to /router or /router/raw. The router only chooses from matching models and returns 400 if none match.
The index keeps separate intelligence, coding, reasoning, speed, latency, context, and blended price signals. Task routing changes the weights before returning a model and fallback.
Use raw endpoints in scripts, CI jobs, or agents that want a plain text model id without parsing JSON.
Yes. The public SKILL.md explains when to call the router, which endpoint to use, and how to validate the selected model.