Code Arena🏆WebDev

View overall rankings across AI models on agentic coding tasks involving multi-step reasoning and tool use.

Apr 22, 2026

249,561 votes

64 models

Rank by

	Rank Spread
1	13	claude-opus-4-7-thinking Anthropic · Proprietary	1576+19/-19	1,286	$5 / $25	1M
2	14	claude-opus-4-7 Anthropic · Proprietary	1566+16/-16	1,684	$5 / $25	1M
3	16	claude-opus-4-6-thinking Anthropic · Proprietary	1552+10/-10	4,815	$5 / $25	1M
4	26	claude-opus-4-6 Anthropic · Proprietary	1545+9/-9	5,723	$5 / $25	1M
5	38	glm-5.1 Z.ai · MIT	1534+14/-14	2,398	$1.05 / $3.50	202.8K
6	38	kimi-k2.6 Moonshot · Modified MIT	1529+18/-18	1,232	$0.75 / $3.50	262.1K
7	58	claude-sonnet-4-6 Anthropic · Proprietary	1526+9/-9	7,660	$3 / $15	1M
8	59	muse-spark Meta · Proprietary	1513+16/-16	1,608	N/A	N/A
9	89	claude-opus-4-5-20251101-thinking-32k Anthropic · Proprietary	1491+7/-7	13,066	$5 / $25	200K
10	1013	qwen3.6-plus Alibaba · Proprietary	1471+12/-12	2,753	$0.33 / $1.95	1M
11	1013	claude-opus-4-5-20251101 Anthropic · Proprietary	1468+6/-6	15,287	$5 / $25	200K
12	1018	gpt-5.4-high (codex-harness) OpenAI · Proprietary	1457+17/-17	1,482	$2.50 / $15	1.1M
13	1015	gemini-3.1-pro-preview Google · Proprietary	1456+8/-8	6,727	$2 / $12	1M
14	1220	glm-4.7 Z.ai · MIT	1440+10/-10	4,879	$0.38 / $1.74	202.8K
15	1320	gemini-3-pro Google · Proprietary	1438+7/-7	17,169	$2 / $12	1M
16	1222	gpt-5.4-medium (codex-harness) OpenAI · Proprietary	1437+16/-16	1,448	$2.50 / $15	1.1M
17	1320	gemini-3-flash Google · Proprietary	1437+7/-7	13,276	$0.50 / $3	1M
18	1321	glm-5 Z.ai · MIT	1436+9/-9	5,576	$1 / $3.20	202.8K
19	1421	kimi-k2.5-thinking Moonshot · Modified MIT	1430+8/-8	7,854	$0.60 / $3	N/A
20	1424	mimo-v2-pro Xiaomi · Proprietary	1426+10/-10	3,946	$1 / $3	1M
21	1726	minimax-m2.7 MiniMax · Modified MIT	1417+10/-10	3,714	$0.30 / $1.20	196.6K
22	2029	kimi-k2.5-instant Moonshot · Modified MIT	1408+11/-11	3,611	$0.44 / $2	262.1K
23	2031	gpt-5.3-codex (codex-harness) OpenAI · Proprietary	1407+12/-12	2,970	$1.75 / $14	400K
24	1935	gpt-5.2 OpenAI · Proprietary	1404+17/-17	1,459	$1.75 / $14	400K
25	2133	grok-4.20-beta-0309-reasoning xAI · Proprietary	1403+10/-10	4,115	$2 / $6	2M
26	2135	gpt-5.4-mini-high OpenAI · Proprietary	1399+13/-13	2,460	$2.50 / $15	1.1M
27	2235	gpt-5-medium OpenAI · Proprietary	1393+13/-13	3,755	$1.25 / $10	400K
28	2235	minimax-m2.1-preview MiniMax · MIT	1392+8/-8	9,274	$0.29 / $0.95	196.6K
29	2235	gpt-5.1-medium OpenAI · Proprietary	1391+9/-9	6,122	$1.25 / $10	400K
30	2335	qwen3.5-397b-a17b Alibaba · Apache 2.0	1389+8/-8	6,624	$0.39 / $2.34	262.1K
31	2335	gemini-3-flash (thinking-minimal) Google · Proprietary	1389+6/-6	13,308	$0.50 / $3	1M
32	2435	claude-sonnet-4-5-20250929-thinking-32k Anthropic · Proprietary	1388+6/-6	15,742	$3 / $15	200K
33	2535	claude-sonnet-4-5-20250929 Anthropic · Proprietary	1386+6/-6	18,405	$3 / $15	200K
34	2435	claude-opus-4-1-20250805 Anthropic · Proprietary	1385+9/-9	8,568	$15 / $75	200K
35	2536	minimax-m2.5 MiniMax · Modified MIT	1384+8/-8	7,847	$0.15 / $1.20	196.6K
36	3538	deepseek-v3.2-thinking DeepSeek · MIT	1368+8/-8	7,911	$0.25 / $0.38	131.1K
37	3639	qwen3.5-122b-a10b Alibaba · Apache 2.0	1363+9/-9	5,428	$0.26 / $2.08	262.1K
38	3640	glm-4.6 Z.ai · MIT	1354+9/-9	8,348	$0.39 / $1.90	204.8K
39	3744	qwen3.5-27b Alibaba · Apache 2.0	1346+9/-9	4,991	$0.20 / $1.56	262.1K
40	3845	gpt-5.1 OpenAI · Proprietary	1339+7/-7	12,872	$1.25 / $10	400K
41	3945	mimo-v2-flash (non-thinking) Xiaomi · MIT	1337+8/-8	6,736	$0.09 / $0.29	262.1K
42	3945	gpt-5.2-codex OpenAI · Proprietary	1335+8/-8	7,765	$1.75 / $14	400K
43	3945	deepseek-v3.2 DeepSeek · MIT	1331+7/-7	10,360	$0.25 / $0.38	131.1K
44	4045	kimi-k2-thinking-turbo Moonshot · Modified MIT	1330+6/-6	15,363	$1.15 / $8	262.1K
45	3946	gpt-5.1-codex OpenAI · Proprietary	1329+9/-9	6,228	$1.25 / $10	400K
46	4548	claude-haiku-4-5-20251001 Anthropic · Proprietary	1315+6/-6	17,784	$1 / $5	200K
47	4649	minimax-m2 MiniMax · Apache 2.0	1304+9/-9	8,401	$0.26 / $1	196.6K
48	4650	mimo-v2-flash (thinking) Xiaomi · MIT	1300+14/-14	2,096	$0.09 / $0.29	262.1K
49	4750	deepseek-v3.2-exp DeepSeek · MIT	1286+11/-11	4,870	$0.27 / $0.41	163.8K
50	4850	qwen3-coder-480b-a35b-instruct Alibaba · Apache 2.0	1281+7/-7	15,214	$0.40 / $1.60	262.1K
51	5157	KAT-Coder-Pro-V1 KwaiKAT · Proprietary	1258+15/-15	1,882	$0.21 / $0.83	256K
52	5157	qwen3.5-35b-a3b Alibaba · Apache 2.0	1248+16/-16	1,814	$0.16 / $1.30	262.1K
53	5158	gpt-5.1-codex-mini OpenAI · Proprietary	1239+17/-17	1,444	$0.25 / $2	400K
54	5158	qwen3.5-flash Alibaba · Proprietary	1236+17/-17	1,560	N/A	N/A
55	5158	gemini-3.1-flash-lite-preview Google · Proprietary	1235+9/-9	6,155	$0.25 / $1.50	1M
56	5158	grok-4-1-fast-reasoning xAI · Proprietary	1234+9/-9	6,916	$0.20 / $0.50	2M
57	5160	mistral-large-3 Mistral · Apache 2.0	1223+20/-20	1,031	$0.50 / $1.50	N/A
58	5360	grok-4.1-thinking xAI · Proprietary	1208+20/-20	1,209	N/A	N/A
59	5760	gemini-2.5-pro Google · Proprietary	1203+13/-13	3,300	$1.25 / $10	1M
60	5761	devstral-2 Mistral · Modified MIT	1199+17/-17	1,582	N/A	N/A
61	6063	mercury-2 Inception AI · Proprietary	1165+23/-23	947	$0.25 / $0.75	128K
62	6163	grok-4-fast-reasoning xAI · Proprietary	1149+23/-23	936	$0.20 / $0.50	2M
63	6163	grok-code-fast-1 xAI · Proprietary	1139+22/-22	984	$0.20 / $1.50	256K
64	6464	devstral-medium-2507 Mistral · Proprietary	1091+23/-23	993	$0.40 / $2	128K

Code Arena🏆WebDev

Remove Style Control Leaderboard Plots

Fraction of Model A Wins for All Non-tied A vs. B Battles

Confidence Intervals on Model Strength (via Bootstrapping)

Average Win Rate Against All Other Models (Uniform Sampling and No Ties)

Battle Count for Each Combination of Models (without Ties)