Code ArenaHTML

View overall rankings across AI models on agentic coding tasks involving multi-step reasoning and tool use.

Apr 14, 2026
126,136 votes
60 models
Rank Spread
1
14
Anthropic
Anthropic · Proprietary
1556+23/-23
803$5 / $251M
2
14
Anthropic
Anthropic · Proprietary
1546+24/-24
718$5 / $251M
3
18
Z.ai · MIT
1530+47/-47
191$0.95 / $3.15202.8K
4
18
Anthropic
Anthropic · Proprietary
1523+20/-20
1,000$3 / $151M
5
38
Anthropic
1503+10/-10
7,955$5 / $25200K
6
310
Google · Proprietary
1497+23/-23
739$2 / $121M
7
322
OpenAI · Proprietary
1473+48/-48
165$2.50 / $151.1M
8
614
Anthropic
Anthropic · Proprietary
1472+10/-10
8,285$5 / $25200K
9
716
Google · Proprietary
1462+9/-9
13,760$2 / $121M
10
326
OpenAI · Proprietary
1458+49/-49
160$2.50 / $151.1M
11
622
Z.ai · MIT
1450+25/-25
601$1 / $3.20202.8K
12
819
Z.ai · MIT
1449+11/-11
4,750$0.39 / $1.75202.8K
13
920
Google · Proprietary
1440+10/-10
9,252$0.50 / $31M
14
731
Xiaomi · Proprietary
1433+32/-32
357$1 / $31M
15
731
MiniMax · Modified MIT
1433+31/-31
364$0.30 / $1.20196.6K
16
832
OpenAI · Proprietary
1423+32/-32
364$1.75 / $14400K
17
733
OpenAI · Proprietary
1419+44/-44
207$2.50 / $151.1M
18
1031
Moonshot · Modified MIT
1417+18/-18
1,142$0.60 / $3N/A
19
932
Moonshot · Modified MIT
1415+25/-25
589$0.38 / $1.72262.1K
20
1131
OpenAI · Proprietary
1412+17/-17
1,461$1.75 / $14400K
21
1132
MiniMax · Modified MIT
1408+20/-20
927$0.12 / $0.99196.6K
22
936
Alibaba · Proprietary
1405+35/-35
295$0.33 / $1.951M
23
1332
MiniMax · MIT
1404+10/-10
6,772$0.29 / $0.95196.6K
24
1332
1402+10/-10
6,038$0.50 / $31M
25
1332
OpenAI · Proprietary
1401+13/-13
3,755$1.25 / $10400K
26
1432
OpenAI · Proprietary
1398+10/-10
6,099$1.25 / $10400K
27
1333
Alibaba · Apache 2.0
1398+22/-22
741$0.39 / $2.34262.1K
28
1432
Anthropic
1396+9/-9
11,315$3 / $15200K
29
1433
Anthropic
Anthropic · Proprietary
1392+9/-9
8,555$15 / $75200K
30
1833
Anthropic
Anthropic · Proprietary
1386+8/-8
12,950$3 / $15200K
31
1438
Alibaba · Apache 2.0
1380+27/-27
503$0.20 / $1.56262.1K
32
2638
DeepSeek · MIT
1372+11/-11
3,995$0.26 / $0.38163.8K
33
1443
1371+31/-31
375$2 / $62M
34
3038
Z.ai · MIT
1362+9/-9
8,319$0.39 / $1.90204.8K
35
3040
1361+11/-11
4,119$0.09 / $0.29262.1K
36
3140
OpenAI · Proprietary
1360+9/-9
10,007$1.25 / $10400K
37
3043
Alibaba · Apache 2.0
1349+25/-25
545$0.26 / $2.08262.1K
38
3143
1349+18/-18
1,183$0.09 / $0.29262.1K
39
3443
OpenAI · Proprietary
1340+12/-12
3,134$1.75 / $14400K
40
3643
OpenAI · Proprietary
1336+10/-10
6,211$1.25 / $10400K
41
3643
Moonshot · Modified MIT
1335+9/-9
9,973$1.15 / $8262.1K
42
3450
Alibaba · Apache 2.0
1315+37/-37
252$0.16 / $1.30262.1K
43
4247
MiniMax · Apache 2.0
1312+9/-10
8,371$0.26 / $1196.6K
44
4247
Anthropic
Anthropic · Proprietary
1308+9/-9
11,184$1 / $5200K
45
4248
DeepSeek · MIT
1307+10/-10
5,199$0.26 / $0.38163.8K
46
3652
Alibaba · Proprietary
1306+43/-43
197N/AN/A
47
4249
DeepSeek · MIT
1294+11/-11
4,870$0.27 / $0.41163.8K
48
4449
Alibaba · Apache 2.0
1291+9/-9
10,765$0.40 / $1.60262.1K
49
4753
Kwai
KwaiKAT · Proprietary
1266+16/-16
1,883$0.21 / $0.83256K
50
4555
Google · Proprietary
1260+25/-25
648$0.25 / $1.501M
51
4856
OpenAI · Proprietary
1247+18/-18
1,444$0.25 / $2400K
52
4956
xAI · Proprietary
1243+11/-11
5,455$0.20 / $0.502M
53
5057
Mistral · Apache 2.0
1228+20/-20
1,032$0.50 / $1.50N/A
54
5057
xAI · Proprietary
1216+20/-20
1,209N/AN/A
55
5157
Mistral · Modified MIT
1214+18/-18
1,340N/AN/A
56
5357
Google · Proprietary
1210+13/-13
3,300$1.25 / $101M
57
4859
Inception AI · Proprietary
1207+70/-70
100$0.25 / $0.75128K
58
5759
xAI · Proprietary
1156+23/-23
936$0.20 / $0.502M
59
5759
xAI · Proprietary
1146+22/-22
984$0.20 / $1.50256K
60
6060
Mistral · Proprietary
1097+23/-23
993$0.40 / $2128K

Remove Style Control Leaderboard Plots

Confidence Intervals on Model Strength (via Bootstrapping)

Average Win Rate Against All Other Models (Uniform Sampling and No Ties)

Battle Count for Each Combination of Models (without Ties)

Fraction of Model A Wins for All Non-tied A vs. B Battles