Code Arena | WebDev๐Ÿ†Overall

View overall rankings across AI models on front-end web development tasks, including agentic coding workflows that require multi-step reasoning and tool use.

Jun 10, 2026
365,577 votes
17 labs
Lab Rank
Model Score
Rank Spread
1
Anthropic
Anthropic
claude-fable-5 ยท Proprietary
1665+20/-20
1
11
2
Alibaba
qwen3.7-max-20260517 ยท Proprietary
1534+12/-12
8
413
3
Z.ai
glm-5.1
1532+11/-11
9
513
4
MiniMax
minimax-m3 ยท Proprietary
1521+14/-14
11
515
5
Moonshot
kimi-k2.6
1515+9/-9
12
815
6
Meta
Meta
muse-spark ยท Proprietary
1508+16/-16
13
817
7
Google
gemini-3.5-flash ยท Proprietary
1506+13/-13
14
1017
8
OpenAI
gpt-5.5-xhigh (codex-harness) ยท Proprietary
1501+9/-9
15
1117
9
Xiaomi
mimo-v2.5-pro
1468+9/-9
19
1723
10
DeepSeek
deepseek-v4-pro-thinking
1462+9/-9
22
1925
11
xAI
grok-4.20-beta-0309-reasoning ยท Proprietary
1390+7/-7
44
3551
12
Tencent
Tencent
hunyuan-hy3-preview
1364+17/-17
55
4659
13
Mistral
mistral-medium-3.5
1268+16/-16
70
6876
14
Kwai
KwaiKAT
KAT-Coder-Pro-V1 ยท Proprietary
1259+16/-16
71
7077
15
Arcee AI
trinity-large-thinking
1245+19/-19
74
7079
16
IBM
granite-4.1-8b
1201+18/-18
81
7883
17
Inception AI
mercury-2 ยท Proprietary
1165+23/-23
83
8185

Remove Style Control Leaderboard Plots

Confidence Intervals on Model Strength (via Bootstrapping)

Battle Count for Each Combination of Models (without Ties)

Average Win Rate Against All Other Models (Uniform Sampling and No Ties)

Fraction of Model A Wins for All Non-tied A vs. B Battles