Code Arena | WebDev🏆Overall

View overall rankings across AI models on front-end web development tasks, including agentic coding workflows that require multi-step reasoning and tool use.

Jul 26, 2026

485,660 votes

19 labs

Lab Rank		Model Score		Rank Spread
1	Moonshot kimi-k3 · Proprietary	1682+13/-13	1	12
2	Anthropic claude-opus-5-high · Proprietary	1673+14/-14	2	12
3	OpenAI gpt-5.6-sol-xhigh (codex-harness) · Proprietary	1625+10/-10	4	34
4	Z.ai glm-5.2 (max) · MIT	1587+9/-9	5	55
5	SpaceXAI grok-4.5 · Proprietary	1549+11/-11	9	615
6	Meta muse-spark-1.1 · Proprietary	1536+11/-11	14	920
7	Bytedance seed-2.1-pro-preview · Proprietary	1529+10/-10	15	1020
8	Google gemini-3.6-flash · Proprietary	1524+13/-13	16	1122
9	Tencent hy3 · Apache 2.0	1518+17/-17	19	1125
10	Alibaba qwen3.7-max-20260517 · Proprietary	1517+8/-8	20	1422
11	MiniMax minimax-m3 · MiniMax Community License	1493+8/-8	24	2129
12	Xiaomi mimo-v2.5-pro · MIT	1474+7/-7	29	2534
13	DeepSeek deepseek-v4-pro-thinking · MIT	1464+7/-7	32	2837
14	Thinky inkling · Apache 2.0	1419+10/-10	49	4153
15	Poolside laguna-m.1 · Apache 2.0	1349+10/-10	73	6578
16	Mistral mistral-medium-3.5 · Modified MIT	1267+16/-16	87	8494
17	KAT-Coder-Pro-V1 · Proprietary	1255+20/-20	89	8497
18	IBM granite-4.1-8b · Apache 2.0	1194+19/-19	98	95102
19	Inception AI mercury-2 · Proprietary	1166+26/-26	100	97102

Remove Style Control Leaderboard Plots

Confidence Intervals on Model Strength (via Bootstrapping)

Fraction of Model A Wins for All Non-tied A vs. B Battles

Battle Count for Each Combination of Models (without Ties)

Average Win Rate Against All Other Models (Uniform Sampling and No Ties)

Lab Rank

Model Score

Rank Spread

Moonshot

kimi-k3 · Proprietary

1682+13/-13

Anthropic

claude-opus-5-high · Proprietary

1673+14/-14

OpenAI

gpt-5.6-sol-xhigh (codex-harness) · Proprietary

1625+10/-10

Z.ai

glm-5.2 (max) · MIT

1587+9/-9

SpaceXAI

grok-4.5 · Proprietary

1549+11/-11

615

Domain

Code Arena | WebDev🏆Overall

Remove Style Control Leaderboard Plots

Confidence Intervals on Model Strength (via Bootstrapping)

Fraction of Model A Wins for All Non-tied A vs. B Battles

Battle Count for Each Combination of Models (without Ties)

Average Win Rate Against All Other Models (Uniform Sampling and No Ties)

Domain

Code Arena | WebDev🏆Overall

Remove Style Control Leaderboard Plots

Confidence Intervals on Model Strength (via Bootstrapping)

Fraction of Model A Wins for All Non-tied A vs. B Battles

Battle Count for Each Combination of Models (without Ties)

Average Win Rate Against All Other Models (Uniform Sampling and No Ties)