Code Arena🏆WebDev

View overall rankings across AI models on agentic coding tasks involving multi-step reasoning and tool use.

Apr 23, 2026

251,161 votes

65 models

Rank by

	Rank Spread
1	13	claude-opus-4-7-thinking Anthropic · Proprietary	1572+18/-18	1,393	$5 / $25	1M
2	14	claude-opus-4-7 Anthropic · Proprietary	1566+16/-16	1,782	$5 / $25	1M
3	16	claude-opus-4-6-thinking Anthropic · Proprietary	1552+10/-10	4,922	$5 / $25	1M
4	26	claude-opus-4-6 Anthropic · Proprietary	1545+9/-9	5,821	$5 / $25	1M
5	38	glm-5.1 Z.ai · MIT	1534+13/-13	2,478	$1.05 / $3.50	202.8K
6	38	kimi-k2.6 Moonshot · Modified MIT	1529+17/-17	1,408	$0.74 / $4.66	256K
7	58	claude-sonnet-4-6 Anthropic · Proprietary	1527+9/-9	7,751	$3 / $15	1M
8	59	muse-spark Meta · Proprietary	1512+16/-16	1,611	N/A	N/A
9	89	claude-opus-4-5-20251101-thinking-32k Anthropic · Proprietary	1491+7/-7	13,065	$5 / $25	200K
10	1014	qwen3.6-plus Alibaba · Proprietary	1471+12/-12	2,854	$0.33 / $1.95	1M
11	1014	claude-opus-4-5-20251101 Anthropic · Proprietary	1468+6/-6	15,283	$5 / $25	200K
12	1016	gemini-3.1-pro-preview Google · Proprietary	1457+8/-8	6,813	$2 / $12	1M
13	1019	gpt-5.4-high (codex-harness) OpenAI · Proprietary	1457+17/-17	1,482	$2.50 / $15	1.1M
14	1020	deepseek-v4-pro-thinking DeepSeek · MIT	1456+19/-19	1,066	$0.43 / $0.87	1M
15	1221	glm-4.7 Z.ai · MIT	1440+10/-10	4,880	$0.38 / $1.74	202.8K
16	1321	gemini-3-pro Google · Proprietary	1438+7/-7	17,169	$2 / $12	1M
17	1223	gpt-5.4-medium (codex-harness) OpenAI · Proprietary	1437+16/-16	1,448	$2.50 / $15	1.1M
18	1321	gemini-3-flash Google · Proprietary	1437+7/-7	13,276	$0.50 / $3	1M
19	1322	glm-5 Z.ai · MIT	1435+9/-9	5,640	$1 / $3.20	202.8K
20	1422	kimi-k2.5-thinking Moonshot · Modified MIT	1430+8/-8	7,913	$0.60 / $3	N/A
21	1525	mimo-v2-pro Xiaomi · Proprietary	1426+10/-10	4,021	$1 / $3	1M
22	1828	minimax-m2.7 MiniMax · Modified MIT	1416+10/-10	3,785	$0.30 / $1.20	196.6K
23	2130	kimi-k2.5-instant Moonshot · Modified MIT	1408+11/-11	3,611	$0.44 / $2	262.1K
24	2133	gpt-5.3-codex (codex-harness) OpenAI · Proprietary	1407+12/-12	2,965	$1.75 / $14	400K
25	2036	gpt-5.2 OpenAI · Proprietary	1404+17/-17	1,459	$1.75 / $14	400K
26	2234	grok-4.20-beta-0309-reasoning xAI · Proprietary	1403+10/-10	4,209	$2 / $6	2M
27	2236	gpt-5.4-mini-high OpenAI · Proprietary	1399+13/-13	2,541	$2.50 / $15	1.1M
28	2236	gpt-5-medium OpenAI · Proprietary	1393+13/-13	3,755	$1.25 / $10	400K
29	2336	minimax-m2.1-preview MiniMax · MIT	1392+8/-8	9,274	$0.29 / $0.95	196.6K
30	2336	gpt-5.1-medium OpenAI · Proprietary	1391+9/-9	6,122	$1.25 / $10	400K
31	2436	gemini-3-flash (thinking-minimal) Google · Proprietary	1389+6/-6	13,385	$0.50 / $3	1M
32	2436	qwen3.5-397b-a17b Alibaba · Apache 2.0	1388+8/-8	6,715	$0.39 / $2.34	262.1K
33	2436	claude-sonnet-4-5-20250929-thinking-32k Anthropic · Proprietary	1388+6/-6	15,740	$3 / $15	200K
34	2636	claude-sonnet-4-5-20250929 Anthropic · Proprietary	1386+6/-6	18,382	$3 / $15	200K
35	2536	claude-opus-4-1-20250805 Anthropic · Proprietary	1385+9/-9	8,568	$15 / $75	200K
36	2637	minimax-m2.5 MiniMax · Modified MIT	1384+8/-8	7,834	$0.15 / $1.15	196.6K
37	3639	deepseek-v3.2-thinking DeepSeek · MIT	1368+8/-8	7,904	$0.25 / $0.38	131.1K
38	3740	qwen3.5-122b-a10b Alibaba · Apache 2.0	1363+9/-9	5,506	$0.26 / $2.08	262.1K
39	3741	glm-4.6 Z.ai · MIT	1355+9/-9	8,349	$0.39 / $1.90	204.8K
40	3846	qwen3.5-27b Alibaba · Apache 2.0	1345+9/-9	5,083	$0.20 / $1.56	262.1K
41	3946	gpt-5.1 OpenAI · Proprietary	1339+7/-7	12,872	$1.25 / $10	400K
42	4046	mimo-v2-flash (non-thinking) Xiaomi · MIT	1337+8/-8	6,736	$0.09 / $0.29	262.1K
43	4046	gpt-5.2-codex OpenAI · Proprietary	1335+8/-8	7,763	$1.75 / $14	400K
44	4046	deepseek-v3.2 DeepSeek · MIT	1332+7/-7	10,427	$0.25 / $0.38	131.1K
45	4046	kimi-k2-thinking-turbo Moonshot · Modified MIT	1329+6/-6	15,346	$1.15 / $8	262.1K
46	4047	gpt-5.1-codex OpenAI · Proprietary	1329+9/-9	6,228	$1.25 / $10	400K
47	4649	claude-haiku-4-5-20251001 Anthropic · Proprietary	1316+6/-6	17,855	$1 / $5	200K
48	4750	minimax-m2 MiniMax · Apache 2.0	1304+9/-9	8,401	$0.26 / $1	196.6K
49	4751	mimo-v2-flash (thinking) Xiaomi · MIT	1301+14/-14	2,096	$0.09 / $0.29	262.1K
50	4851	deepseek-v3.2-exp DeepSeek · MIT	1286+11/-11	4,870	$0.27 / $0.41	163.8K
51	4951	qwen3-coder-480b-a35b-instruct Alibaba · Apache 2.0	1281+7/-7	15,208	$0.40 / $1.60	262.1K
52	5258	KAT-Coder-Pro-V1 KwaiKAT · Proprietary	1258+15/-15	1,882	$0.21 / $0.83	256K
53	5258	qwen3.5-35b-a3b Alibaba · Apache 2.0	1248+16/-16	1,814	$0.16 / $1.30	262.1K
54	5259	gpt-5.1-codex-mini OpenAI · Proprietary	1239+17/-17	1,444	$0.25 / $2	400K
55	5259	qwen3.5-flash Alibaba · Proprietary	1236+17/-17	1,560	N/A	N/A
56	5259	gemini-3.1-flash-lite-preview Google · Proprietary	1235+9/-9	6,233	$0.25 / $1.50	1M
57	5259	grok-4-1-fast-reasoning xAI · Proprietary	1234+9/-9	6,916	$0.20 / $0.50	2M
58	5261	mistral-large-3 Mistral · Apache 2.0	1223+20/-20	1,031	$0.50 / $1.50	N/A
59	5461	grok-4.1-thinking xAI · Proprietary	1208+20/-20	1,209	N/A	N/A
60	5861	gemini-2.5-pro Google · Proprietary	1203+13/-13	3,300	$1.25 / $10	1M
61	5862	devstral-2 Mistral · Modified MIT	1199+17/-17	1,582	N/A	N/A
62	6164	mercury-2 Inception AI · Proprietary	1165+23/-23	947	$0.25 / $0.75	128K
63	6264	grok-4-fast-reasoning xAI · Proprietary	1149+23/-23	936	$0.20 / $0.50	2M
64	6264	grok-code-fast-1 xAI · Proprietary	1139+22/-22	984	$0.20 / $1.50	256K
65	6565	devstral-medium-2507 Mistral · Proprietary	1091+23/-23	993	$0.40 / $2	128K

Code Arena🏆WebDev

Remove Style Control Leaderboard Plots

Fraction of Model A Wins for All Non-tied A vs. B Battles

Confidence Intervals on Model Strength (via Bootstrapping)

Average Win Rate Against All Other Models (Uniform Sampling and No Ties)

Battle Count for Each Combination of Models (without Ties)