Code Arena🏆Overall

View overall rankings across AI models on agentic coding tasks involving multi-step reasoning and tool use.

Apr 1, 2026

224,709 votes

59 models

	Rank Spread
1	12	claude-opus-4-6-thinking Anthropic · Proprietary	1546+12/-12	3,698	$5 / $25	1M
2	12	claude-opus-4-6 Anthropic · Proprietary	1543+11/-11	4,479	$5 / $25	1M
3	33	claude-sonnet-4-6 Anthropic · Proprietary	1521+9/-9	7,086	$3 / $15	1M
4	44	claude-opus-4-5-20251101-thinking-32k Anthropic · Proprietary	1491+7/-7	13,254	$5 / $25	200K
5	58	claude-opus-4-5-20251101 Anthropic · Proprietary	1465+7/-7	14,248	$5 / $25	200K
6	515	gpt-5.4-high (codex-harness) OpenAI · Proprietary	1457+17/-17	1,495	N/A	N/A
7	510	gemini-3.1-pro-preview Google · Proprietary	1456+9/-9	5,467	$2 / $12	1M
8	516	qwen3.6-plus-preview Alibaba · Proprietary	1454+19/-19	1,125	N/A	N/A
9	616	glm-5 Z.ai · MIT	1441+10/-10	4,536	$1 / $3.20	202.8K
10	616	glm-4.7 Z.ai · MIT	1439+10/-10	4,876	$0.39 / $1.75	202.8K
11	716	gemini-3-pro Google · Proprietary	1438+7/-7	17,165	$2 / $12	1M
12	716	gemini-3-flash Google · Proprietary	1436+7/-7	13,282	$0.50 / $3	1M
13	716	mimo-v2-pro Xiaomi · Proprietary	1433+12/-12	2,903	$1 / $3	1M
14	816	kimi-k2.5-thinking Moonshot · Modified MIT	1429+8/-8	6,694	$0.60 / $3	N/A
15	719	minimax-m2.7 MiniMax · Proprietary	1428+12/-12	2,716	$0.30 / $1.20	204.8K
16	719	gpt-5.4-medium (codex-harness) OpenAI · Proprietary	1427+16/-16	1,579	N/A	N/A
17	1526	kimi-k2.5-instant Moonshot · Modified MIT	1408+11/-11	3,610	$0.38 / $1.72	262.1K
18	1528	gpt-5.3-codex (codex-harness) OpenAI · Proprietary	1407+12/-12	2,974	$1.75 / $14	400K
19	1530	gpt-5.2 OpenAI · Proprietary	1403+17/-17	1,460	$1.75 / $14	400K
20	1730	minimax-m2.5 MiniMax · Modified MIT	1396+8/-8	6,716	$0.12 / $0.99	196.6K
21	1730	gpt-5-medium OpenAI · Proprietary	1392+13/-13	3,753	$1.25 / $10	400K
22	1730	minimax-m2.1-preview MiniMax · MIT	1391+8/-8	9,275	$0.27 / $0.95	196.6K
23	1730	gemini-3-flash (thinking-minimal) Google · Proprietary	1391+7/-7	12,208	$0.50 / $3	1M
24	1730	gpt-5.1-medium OpenAI · Proprietary	1390+9/-9	6,124	$1.25 / $10	400K
25	1830	claude-sonnet-4-5-20250929-thinking-32k Anthropic · Proprietary	1388+6/-6	15,916	$3 / $15	200K
26	1830	qwen3.5-397b-a17b Alibaba · Apache 2.0	1386+9/-9	5,559	$0.39 / $2.34	262.1K
27	1930	claude-sonnet-4-5-20250929 Anthropic · Proprietary	1386+6/-6	18,512	$3 / $15	200K
28	1731	grok-4.20-beta-0309-reasoning xAI · Proprietary	1386+12/-12	3,030	$2 / $6	2M
29	1732	gpt-5.4-mini-high OpenAI · Proprietary	1385+18/-18	1,198	$0.75 / $4.50	400K
30	1931	claude-opus-4-1-20250805 Anthropic · Proprietary	1384+9/-9	8,570	$15 / $75	200K
31	2833	deepseek-v3.2-thinking DeepSeek · MIT	1368+8/-8	8,118	$0.26 / $0.38	163.8K
32	3034	qwen3.5-122b-a10b Alibaba · Apache 2.0	1362+10/-10	4,272	$0.26 / $2.08	262.1K
33	3135	glm-4.6 Z.ai · MIT	1354+9/-9	8,345	$0.39 / $1.90	204.8K
34	3240	qwen3.5-27b Alibaba · Apache 2.0	1344+10/-10	3,958	$0.20 / $1.56	262.1K
35	3340	gpt-5.1 OpenAI · Proprietary	1339+7/-7	12,868	$1.25 / $10	400K
36	3440	mimo-v2-flash (non-thinking) Xiaomi · MIT	1337+8/-8	6,737	$0.09 / $0.29	262.1K
37	3440	gpt-5.2-codex OpenAI · Proprietary	1335+8/-8	7,956	$1.75 / $14	400K
38	3440	kimi-k2-thinking-turbo Moonshot · Modified MIT	1329+6/-6	15,230	$1.15 / $8	262.1K
39	3440	gpt-5.1-codex OpenAI · Proprietary	1328+9/-9	6,225	$1.25 / $10	400K
40	3440	deepseek-v3.2 DeepSeek · MIT	1327+7/-7	9,603	$0.26 / $0.38	163.8K
41	4143	claude-haiku-4-5-20251001 Anthropic · Proprietary	1312+6/-6	16,594	$1 / $5	200K
42	4144	minimax-m2 MiniMax · Apache 2.0	1303+9/-9	8,400	$0.26 / $1	196.6K
43	4145	mimo-v2-flash (thinking) Xiaomi · MIT	1300+14/-14	2,096	$0.09 / $0.29	262.1K
44	4245	deepseek-v3.2-exp DeepSeek · MIT	1285+11/-11	4,869	$0.27 / $0.41	163.8K
45	4345	qwen3-coder-480b-a35b-instruct Alibaba · Apache 2.0	1280+6/-6	15,380	$0.40 / $1.60	262.1K
46	4651	KAT-Coder-Pro-V1 KwaiKAT · Proprietary	1257+15/-15	1,883	$0.21 / $0.83	256K
47	4652	qwen3.5-35b-a3b Alibaba · Apache 2.0	1247+16/-16	1,817	$0.16 / $1.30	262.1K
48	4652	gemini-3.1-flash-lite-preview Google · Proprietary	1238+10/-10	5,276	$0.25 / $1.50	1M
49	4653	gpt-5.1-codex-mini OpenAI · Proprietary	1238+17/-17	1,443	$0.25 / $2	400K
50	4653	qwen3.5-flash Alibaba · Proprietary	1235+17/-17	1,562	N/A	N/A
51	4653	grok-4-1-fast-reasoning xAI · Proprietary	1233+9/-9	6,917	$0.20 / $0.50	2M
52	4756	mistral-large-3 Mistral · Apache 2.0	1221+20/-20	1,031	$0.50 / $1.50	N/A
53	4956	grok-4.1-thinking xAI · Proprietary	1207+20/-20	1,209	N/A	N/A
54	5256	gemini-2.5-pro Google · Proprietary	1202+13/-13	3,295	$1.25 / $10	1M
55	5256	devstral-2 Mistral · Modified MIT	1198+17/-17	1,585	N/A	N/A
56	5257	mercury-2 Inception AI · Proprietary	1182+21/-21	1,107	$0.25 / $0.75	128K
57	5658	grok-4-fast-reasoning xAI · Proprietary	1148+23/-23	934	$0.20 / $0.50	2M
58	5758	grok-code-fast-1 xAI · Proprietary	1138+22/-22	983	$0.20 / $1.50	256K
59	5959	devstral-medium-2507 Mistral · Proprietary	1090+23/-23	993	$0.40 / $2	128K

Code Arena🏆Overall

Remove Style Control Leaderboard Plots

Confidence Intervals on Model Strength (via Bootstrapping)

Battle Count for Each Combination of Models (without Ties)

Average Win Rate Against All Other Models (Uniform Sampling and No Ties)

Fraction of Model A Wins for All Non-tied A vs. B Battles