Code Arena🏆WebDev

View overall rankings across AI models on agentic coding tasks involving multi-step reasoning and tool use.

Apr 23, 2026
251,161 votes
65 models
Rank Spread
1
13
Anthropic
Anthropic · Proprietary
1572+18/-18
1,393$5 / $251M
2
14
Anthropic
Anthropic · Proprietary
1566+16/-16
1,782$5 / $251M
3
16
Anthropic
Anthropic · Proprietary
1552+10/-10
4,922$5 / $251M
4
26
Anthropic
Anthropic · Proprietary
1545+9/-9
5,821$5 / $251M
5
38
Z.ai · MIT
1534+13/-13
2,478$1.05 / $3.50202.8K
6
38
Moonshot · Modified MIT
1529+17/-17
1,408$0.74 / $4.66256K
7
58
Anthropic
Anthropic · Proprietary
1527+9/-9
7,751$3 / $151M
8
59
Meta
Meta · Proprietary
1512+16/-16
1,611N/AN/A
9
89
Anthropic
1491+7/-7
13,065$5 / $25200K
10
1014
Alibaba · Proprietary
1471+12/-12
2,854$0.33 / $1.951M
11
1014
Anthropic
Anthropic · Proprietary
1468+6/-6
15,283$5 / $25200K
12
1016
Google · Proprietary
1457+8/-8
6,813$2 / $121M
13
1019
OpenAI · Proprietary
1457+17/-17
1,482$2.50 / $151.1M
14
1020
1456+19/-19
1,066$0.43 / $0.871M
15
1221
Z.ai · MIT
1440+10/-10
4,880$0.38 / $1.74202.8K
16
1321
Google · Proprietary
1438+7/-7
17,169$2 / $121M
17
1223
OpenAI · Proprietary
1437+16/-16
1,448$2.50 / $151.1M
18
1321
Google · Proprietary
1437+7/-7
13,276$0.50 / $31M
19
1322
Z.ai · MIT
1435+9/-9
5,640$1 / $3.20202.8K
20
1422
Moonshot · Modified MIT
1430+8/-8
7,913$0.60 / $3N/A
21
1525
Xiaomi · Proprietary
1426+10/-10
4,021$1 / $31M
22
1828
MiniMax · Modified MIT
1416+10/-10
3,785$0.30 / $1.20196.6K
23
2130
Moonshot · Modified MIT
1408+11/-11
3,611$0.44 / $2262.1K
24
2133
OpenAI · Proprietary
1407+12/-12
2,965$1.75 / $14400K
25
2036
OpenAI · Proprietary
1404+17/-17
1,459$1.75 / $14400K
26
2234
1403+10/-10
4,209$2 / $62M
27
2236
OpenAI · Proprietary
1399+13/-13
2,541$2.50 / $151.1M
28
2236
OpenAI · Proprietary
1393+13/-13
3,755$1.25 / $10400K
29
2336
MiniMax · MIT
1392+8/-8
9,274$0.29 / $0.95196.6K
30
2336
OpenAI · Proprietary
1391+9/-9
6,122$1.25 / $10400K
31
2436
1389+6/-6
13,385$0.50 / $31M
32
2436
Alibaba · Apache 2.0
1388+8/-8
6,715$0.39 / $2.34262.1K
33
2436
Anthropic
1388+6/-6
15,740$3 / $15200K
34
2636
Anthropic
Anthropic · Proprietary
1386+6/-6
18,382$3 / $15200K
35
2536
Anthropic
Anthropic · Proprietary
1385+9/-9
8,568$15 / $75200K
36
2637
MiniMax · Modified MIT
1384+8/-8
7,834$0.15 / $1.15196.6K
37
3639
DeepSeek · MIT
1368+8/-8
7,904$0.25 / $0.38131.1K
38
3740
Alibaba · Apache 2.0
1363+9/-9
5,506$0.26 / $2.08262.1K
39
3741
Z.ai · MIT
1355+9/-9
8,349$0.39 / $1.90204.8K
40
3846
Alibaba · Apache 2.0
1345+9/-9
5,083$0.20 / $1.56262.1K
41
3946
OpenAI · Proprietary
1339+7/-7
12,872$1.25 / $10400K
42
4046
1337+8/-8
6,736$0.09 / $0.29262.1K
43
4046
OpenAI · Proprietary
1335+8/-8
7,763$1.75 / $14400K
44
4046
DeepSeek · MIT
1332+7/-7
10,427$0.25 / $0.38131.1K
45
4046
Moonshot · Modified MIT
1329+6/-6
15,346$1.15 / $8262.1K
46
4047
OpenAI · Proprietary
1329+9/-9
6,228$1.25 / $10400K
47
4649
Anthropic
Anthropic · Proprietary
1316+6/-6
17,855$1 / $5200K
48
4750
MiniMax · Apache 2.0
1304+9/-9
8,401$0.26 / $1196.6K
49
4751
1301+14/-14
2,096$0.09 / $0.29262.1K
50
4851
DeepSeek · MIT
1286+11/-11
4,870$0.27 / $0.41163.8K
51
4951
Alibaba · Apache 2.0
1281+7/-7
15,208$0.40 / $1.60262.1K
52
5258
Kwai
KwaiKAT · Proprietary
1258+15/-15
1,882$0.21 / $0.83256K
53
5258
Alibaba · Apache 2.0
1248+16/-16
1,814$0.16 / $1.30262.1K
54
5259
OpenAI · Proprietary
1239+17/-17
1,444$0.25 / $2400K
55
5259
Alibaba · Proprietary
1236+17/-17
1,560N/AN/A
56
5259
Google · Proprietary
1235+9/-9
6,233$0.25 / $1.501M
57
5259
xAI · Proprietary
1234+9/-9
6,916$0.20 / $0.502M
58
5261
Mistral · Apache 2.0
1223+20/-20
1,031$0.50 / $1.50N/A
59
5461
xAI · Proprietary
1208+20/-20
1,209N/AN/A
60
5861
Google · Proprietary
1203+13/-13
3,300$1.25 / $101M
61
5862
Mistral · Modified MIT
1199+17/-17
1,582N/AN/A
62
6164
Inception AI · Proprietary
1165+23/-23
947$0.25 / $0.75128K
63
6264
xAI · Proprietary
1149+23/-23
936$0.20 / $0.502M
64
6264
xAI · Proprietary
1139+22/-22
984$0.20 / $1.50256K
65
6565
Mistral · Proprietary
1091+23/-23
993$0.40 / $2128K

Remove Style Control Leaderboard Plots

Fraction of Model A Wins for All Non-tied A vs. B Battles

Confidence Intervals on Model Strength (via Bootstrapping)

Average Win Rate Against All Other Models (Uniform Sampling and No Ties)

Battle Count for Each Combination of Models (without Ties)