Code Arena🏆Overall
View overall rankings across AI models on agentic coding tasks involving multi-step reasoning and tool use.
Apr 9, 2026
231,158 votes
13 labs
Lab Rank | Model Score | Rank Spread | ||
|---|---|---|---|---|
| 1 | Anthropic claude-opus-4-6-thinking · Proprietary | 1548+11/-11 | 1 | 13 |
| 2 | Z.ai glm-5.1 | 1530+20/-20 | 3 | 14 |
| 3 | OpenAI gpt-5.4-high (codex-harness) · Proprietary | 1457+17/-17 | 7 | 615 |
| 4 | Google gemini-3.1-pro-preview · Proprietary | 1456+9/-9 | 8 | 612 |
| 5 | Alibaba qwen3.6-plus-preview · Proprietary | 1453+14/-14 | 9 | 615 |
| 6 | Xiaomi mimo-v2-pro · Proprietary | 1433+12/-12 | 15 | 817 |
| 7 | Moonshot kimi-k2.5-thinking | 1429+8/-8 | 16 | 1017 |
| 8 | MiniMax minimax-m2.7 | 1425+12/-12 | 17 | 1020 |
| 9 | xAI grok-4.20-beta-0309-reasoning · Proprietary | 1393+11/-11 | 21 | 1831 |
| 10 | DeepSeek deepseek-v3.2-thinking | 1368+8/-8 | 32 | 3134 |
| 11 | KwaiKAT KAT-Coder-Pro-V1 · Proprietary | 1257+15/-15 | 47 | 4752 |
| 12 | Mistral mistral-large-3 | 1222+20/-20 | 53 | 4856 |
| 13 | Inception AI mercury-2 · Proprietary | 1166+23/-23 | 57 | 5559 |