Code Arena | WebDev๐Ÿ†Overall

View overall rankings across AI models on front-end web development tasks, including agentic coding workflows that require multi-step reasoning and tool use.

May 24, 2026
328,594 votes
17 labs
Lab Rank
Model Score
Rank Spread
1
Anthropic
Anthropic
claude-opus-4-7-thinking ยท Proprietary
1567+10/-10
1
12
2
Alibaba
qwen3.7-max-20260517 ยท Proprietary
1541+16/-16
4
28
3
Z.ai
glm-5.1
1533+11/-11
6
39
4
Moonshot
kimi-k2.6
1518+10/-10
8
511
5
Meta
Meta
muse-spark ยท Proprietary
1508+16/-16
9
613
6
Google
gemini-3.5-flash ยท Proprietary
1506+13/-13
10
713
7
OpenAI
gpt-5.5-xhigh (codex-harness) ยท Proprietary
1505+10/-10
11
813
8
Xiaomi
mimo-v2.5-pro
1471+9/-9
15
1319
9
DeepSeek
deepseek-v4-pro-thinking
1464+10/-10
17
1322
10
MiniMax
minimax-m2.7
1401+8/-8
34
3043
11
xAI
grok-4.20-beta-0309-reasoning ยท Proprietary
1395+8/-8
35
3046
12
Tencent
Tencent
hunyuan-hy3-preview
1365+17/-17
50
3854
13
Kwai
KwaiKAT
KAT-Coder-Pro-V1 ยท Proprietary
1259+15/-15
66
6673
14
Arcee AI
trinity-large-thinking
1245+19/-19
69
6674
15
Mistral
mistral-large-3
1223+20/-20
73
6677
16
IBM
granite-4.1-8b
1202+18/-18
76
7378
17
Inception AI
mercury-2 ยท Proprietary
1165+23/-23
78
7680

Remove Style Control Leaderboard Plots

Fraction of Model A Wins for All Non-tied A vs. B Battles

Confidence Intervals on Model Strength (via Bootstrapping)

Battle Count for Each Combination of Models (without Ties)

Average Win Rate Against All Other Models (Uniform Sampling and No Ties)