Code Arena | WebDev

Compare the performance of AI models on agentic coding tasks involving multi-step reasoning and tool use

Feb 12, 2026
151,146 votes
41 models
Rank Spread
1
12
Anthropic
Anthropic 路 Proprietary
1567+17/-17
1,625
2
12
Anthropic
Anthropic 路 Proprietary
1560+15/-15
2,113
3
33
Anthropic
Anthropic 路 Proprietary
1503+8/-8
9,892
4
47
OpenAI 路 Proprietary
1473+16/-16
1,691
5
46
Anthropic
Anthropic 路 Proprietary
1469+8/-8
10,054
6
411
Z.ai 路 MIT
1449+16/-16
1,643
7
610
Google 路 Proprietary
1449+8/-8
16,009
8
511
MoonshotAI
Moonshot 路 Modified MIT
1447+12/-12
2,916
9
611
Google 路 Proprietary
1444+8/-8
11,623
10
611
Z.ai 路 MIT
1442+10/-10
5,130
11
714
MoonshotAI
Moonshot 路 Modified MIT
1423+15/-15
1,880
12
1117
Minimax
MiniMax 路 MIT
1407+8/-8
8,867
13
1118
Google 路 Proprietary
1404+9/-9
7,690
14
1120
OpenAI 路 Proprietary
1398+16/-16
1,633
15
1220
OpenAI 路 Proprietary
1395+12/-12
3,926
16
1220
Anthropic
Anthropic 路 Proprietary
1391+8/-8
8,979
17
1220
OpenAI 路 Proprietary
1390+9/-9
6,437
18
1320
Anthropic
1390+7/-7
13,158
19
1420
Anthropic
Anthropic 路 Proprietary
1386+7/-7
14,778
20
1421
DeepSeek 路 MIT
1375+10/-10
5,123
21
2023
Z.ai 路 MIT
1358+8/-8
8,744
22
2125
OpenAI 路 Proprietary
1348+7/-7
12,075
23
2126
1343+9/-9
5,960
24
2226
OpenAI 路 Proprietary
1338+10/-10
4,693
25
2226
MoonshotAI
Moonshot 路 Modified MIT
1336+7/-7
11,535
26
2328
OpenAI 路 Proprietary
1331+9/-9
6,502
27
2629
Minimax
MiniMax 路 Apache 2.0
1314+9/-9
8,832
28
2629
DeepSeek 路 MIT
1314+9/-9
6,408
29
2729
Anthropic
Anthropic 路 Proprietary
1307+7/-7
12,865
30
3031
DeepSeek 路 MIT
1289+10/-10
5,130
31
3031
Qwen Icon
Alibaba 路 Apache 2.0
1284+7/-7
12,607
32
3234
Kwai
KwaiKAT 路 Proprietary
1261+15/-15
1,954
33
3235
OpenAI 路 Proprietary
1245+17/-17
1,537
34
3235
xAI 路 Proprietary
1237+9/-9
7,167
35
3338
Mistral 路 Apache 2.0
1225+20/-20
1,037
36
3538
Google 路 Proprietary
1208+13/-13
3,453
37
3538
xAI 路 Proprietary
1206+19/-19
1,266
38
3538
Mistral 路 Modified MIT
1201+16/-16
1,681
39
3940
xAI 路 Proprietary
1155+22/-22
968
40
3941
xAI 路 Proprietary
1143+21/-21
1,016
41
4041
Mistral 路 Proprietary
1101+22/-22
1,021

Remove Style Control Leaderboard Plots

Confidence Intervals on Model Strength (via Bootstrapping)

Battle Count for Each Combination of Models (without Ties)

Fraction of Model A Wins for All Non-tied A vs. B Battles

Average Win Rate Against All Other Models (Uniform Sampling and No Ties)