Code Arena | WebDev🏆Overall

View overall rankings across AI models on front-end web development tasks, including agentic coding workflows that require multi-step reasoning and tool use.

Jul 13, 2026

469,677 votes

19 labs

Rank by

Lab Rank		Model Score		Rank Spread
1	OpenAI gpt-5.6-sol-xhigh (codex-harness) · Proprietary	1631+17/-17	1	12
2	Anthropic claude-fable-5 · Proprietary	1630+14/-14	2	12
3	Z.ai glm-5.2 (max)	1581+10/-10	3	34
4	SpaceXAI grok-4.5 · Proprietary	1557+15/-15	6	312
5	Meta muse-spark-1.1 · Proprietary	1536+15/-15	10	416
6	Bytedance seed-2.1-pro-preview · Proprietary	1536+11/-11	11	716
7	Alibaba qwen3.7-max-20260517 · Proprietary	1518+9/-9	16	1117
8	Moonshot kimi-k2.6	1513+7/-7	17	1419
9	Google gemini-3.5-flash-medium · Proprietary	1497+9/-9	19	1722
10	MiniMax minimax-m3	1495+9/-9	20	1823
11	Xiaomi mimo-v2.5-pro	1475+7/-7	24	2227
12	DeepSeek deepseek-v4-pro-thinking	1458+7/-7	27	2532
13	Tencent hunyuan-hy3-preview	1361+17/-17	64	5469
14	Poolside laguna-m.1	1358+11/-11	65	5867
15	Mistral mistral-medium-3.5	1267+15/-15	80	7886
16	KwaiKAT KAT-Coder-Pro-V1 · Proprietary	1259+16/-16	81	8087
17	Arcee AI trinity-large-thinking	1243+19/-19	84	8089
18	IBM granite-4.1-8b	1200+17/-17	92	8893
19	Inception AI mercury-2 · Proprietary	1165+23/-23	93	9195

Remove Style Control Leaderboard Plots

Confidence Intervals on Model Strength (via Bootstrapping)

Average Win Rate Against All Other Models (Uniform Sampling and No Ties)

Fraction of Model A Wins for All Non-tied A vs. B Battles

Battle Count for Each Combination of Models (without Ties)

Lab Rank

Model Score

Rank Spread

OpenAI

gpt-5.6-sol-xhigh (codex-harness) · Proprietary

1631+17/-17

Anthropic

claude-fable-5 · Proprietary

1630+14/-14

Z.ai

glm-5.2 (max)

1581+10/-10

SpaceXAI

grok-4.5 · Proprietary

1557+15/-15

312

Domain

Code Arena | WebDev🏆Overall

Remove Style Control Leaderboard Plots

Confidence Intervals on Model Strength (via Bootstrapping)

Average Win Rate Against All Other Models (Uniform Sampling and No Ties)

Fraction of Model A Wins for All Non-tied A vs. B Battles

Battle Count for Each Combination of Models (without Ties)

Domain

Code Arena | WebDev🏆Overall

Remove Style Control Leaderboard Plots

Confidence Intervals on Model Strength (via Bootstrapping)

Average Win Rate Against All Other Models (Uniform Sampling and No Ties)

Fraction of Model A Wins for All Non-tied A vs. B Battles

Battle Count for Each Combination of Models (without Ties)