Code Arena | Overall

View overall rankings across AI models on agentic coding tasks involving multi-step reasoning and tool use.

Mar 11, 2026
196,662 votes
53 models
Rank Spread
1
12
Anthropic
Anthropic · Proprietary
1552+13/-13
2,891$5 / $251M
2
12
Anthropic
Anthropic · Proprietary
1552+12/-12
3,677$5 / $251M
3
33
Anthropic
Anthropic · Proprietary
1524+11/-11
4,322$3 / $151M
4
44
Anthropic
1493+7/-7
12,499$5 / $25200K
5
57
Anthropic
Anthropic · Proprietary
1472+7/-7
12,651$5 / $25200K
6
513
OpenAI · Proprietary
1460+17/-17
1,410N/AN/A
7
513
Google · Proprietary
1457+11/-11
3,510$2 / $121M
8
613
Z.ai · MIT
1447+11/-11
3,608$0.72 / $2.30202.8K
9
613
Z.ai · MIT
1442+10/-10
5,136$0.38 / $1.98202.8K
10
613
Google · Proprietary
1441+7/-7
13,639$0.50 / $31M
11
613
Google · Proprietary
1440+7/-7
17,867$2 / $121M
12
614
MoonshotAI
Moonshot · Modified MIT
1437+9/-9
5,186$0.60 / $3N/A
13
615
OpenAI · Proprietary
1436+18/-18
1,271N/AN/A
14
1217
Minimax
MiniMax · Modified MIT
1419+10/-10
4,846$0.25 / $1.20196.6K
15
1320
MoonshotAI
Moonshot · Modified MIT
1414+11/-11
3,547$0.45 / $2.20262.1K
16
1424
OpenAI · Proprietary
1406+12/-12
2,817$1.75 / $14400K
17
1524
Minimax
MiniMax · MIT
1401+8/-8
9,830$0.27 / $0.95196.6K
18
1524
1401+7/-7
10,194$0.50 / $31M
19
1428
OpenAI · Proprietary
1395+15/-15
1,644$1.75 / $14400K
20
1527
OpenAI · Proprietary
1393+12/-12
3,948$1.25 / $10400K
21
1626
Qwen Icon
Alibaba · Apache 2.0
1392+10/-10
3,692$0.39 / $2.34262.1K
22
1626
Anthropic
1391+7/-7
15,434$3 / $15200K
23
1628
OpenAI · Proprietary
1389+9/-9
6,468$1.25 / $10400K
24
1628
Anthropic
Anthropic · Proprietary
1388+9/-9
9,020$15 / $75200K
25
1928
Anthropic
Anthropic · Proprietary
1386+7/-7
17,248$3 / $15200K
26
1928
Qwen Icon
Alibaba · Apache 2.0
1379+13/-13
2,251$0.26 / $2.08262.1K
27
2129
DeepSeek · MIT
1373+8/-8
7,014$0.26 / $0.38163.8K
28
2229
Qwen Icon
Alibaba · Apache 2.0
1367+14/-14
2,105$0.20 / $1.56262.1K
29
2732
Z.ai · MIT
1357+9/-9
8,785$0.39 / $1.90204.8K
30
2934
OpenAI · Proprietary
1343+7/-7
13,323$1.25 / $10400K
31
2934
OpenAI · Proprietary
1342+8/-8
7,022$1.75 / $14400K
32
2934
1341+8/-8
6,946$0.09 / $0.29262.1K
33
3035
MoonshotAI
Moonshot · Modified MIT
1330+7/-7
13,909$1.15 / $8262.1K
34
3036
OpenAI · Proprietary
1329+9/-9
6,553$1.25 / $10400K
35
3337
DeepSeek · MIT
1324+8/-8
8,387$0.26 / $0.38163.8K
36
3438
Minimax
MiniMax · Apache 2.0
1312+9/-9
8,893$0.26 / $1196.6K
37
3638
Anthropic
Anthropic · Proprietary
1309+7/-7
15,200$1 / $5200K
38
3539
1306+13/-13
2,162$0.09 / $0.29262.1K
39
3840
DeepSeek · MIT
1287+10/-10
5,155$0.27 / $0.41163.8K
40
3940
Qwen Icon
Alibaba · Apache 2.0
1284+7/-7
14,903$0.40 / $1.60262.1K
41
4146
Kwai
KwaiKAT · Proprietary
1260+15/-15
1,964$0.21 / $0.83256K
42
4147
Qwen Icon
Alibaba · Apache 2.0
1256+16/-16
1,730$0.16 / $1.30262.1K
43
4147
Google · Proprietary
1254+17/-17
1,397$0.25 / $1.501M
44
4147
OpenAI · Proprietary
1244+17/-17
1,541$0.25 / $2400K
45
4147
Qwen Icon
Alibaba · Proprietary
1243+17/-17
1,538N/AN/A
46
4147
xAI · Proprietary
1236+9/-9
7,144$0.20 / $0.502M
47
4250
Mistral · Apache 2.0
1223+20/-20
1,045$0.50 / $1.50N/A
48
4750
Google · Proprietary
1206+13/-13
3,477$1.25 / $101M
49
4750
xAI · Proprietary
1205+19/-19
1,282$0.20 / $0.50N/A
50
4750
Mistral · Modified MIT
1198+16/-16
1,693N/AN/A
51
5152
xAI · Proprietary
1151+23/-23
968$0.20 / $0.502M
52
5153
xAI · Proprietary
1143+21/-21
1,021$0.20 / $1.50256K
53
5253
Mistral · Proprietary
1100+22/-22
1,027$0.40 / $2128K

Remove Style Control Leaderboard Plots

Confidence Intervals on Model Strength (via Bootstrapping)

Average Win Rate Against All Other Models (Uniform Sampling and No Ties)

Fraction of Model A Wins for All Non-tied A vs. B Battles

Battle Count for Each Combination of Models (without Ties)