Code Arena

Compare the performance of AI models on agentic coding tasks involving multi-step reasoning and tool use

Last Updated

Jan 29, 2026

Total Votes

120,051

Total Models

36

Rank Spread
1
1◄─►1
1497+10/-107,953
Anthropic
Proprietary
2
2◄─►4
1470+16/-161,689
OpenAI
Proprietary
3
2◄─►4
1468+9/-98,193
Anthropic
Proprietary
4
2◄─►6
1454+8/-814,107
Google
Proprietary
5
4◄─►6
1443+9/-99,548
Google
Proprietary
6
4◄─►6
1440+10/-105,110
Z.ai
MIT
7
7◄─►10
1408+9/-97,161
MiniMax
MIT
8
7◄─►14
1399+10/-105,612
Google
Proprietary
9
7◄─►15
1395+16/-161,632
OpenAI
Proprietary
10
7◄─►15
1392+12/-123,926
OpenAI
Proprietary
11
8◄─►15
1387+9/-98,975
Anthropic
Proprietary
12
8◄─►15
1387+9/-96,429
OpenAI
Proprietary
13
8◄─►15
1386+8/-812,959
Anthropic
Proprietary
14
8◄─►15
1386+8/-811,367
Anthropic
Proprietary
15
9◄─►16
1372+11/-113,782
DeepSeek
MIT
16
15◄─►18
1355+9/-98,738
Z.ai
MIT
17
16◄─►19
1351+8/-810,303
OpenAI
Proprietary
18
16◄─►21
1347+10/-104,223
Xiaomi
MIT
19
17◄─►22
1331+12/-122,919
OpenAI
Proprietary
20
18◄─►21
1329+8/-89,891
Moonshot
Modified MIT
21
18◄─►22
1328+9/-96,501
OpenAI
Proprietary
22
20◄─►23
Minimax
1311+9/-98,838
MiniMax
Apache 2.0
23
22◄─►26
1296+10/-104,844
DeepSeek
MIT
24
23◄─►26
1292+8/-811,111
Anthropic
Proprietary
25
23◄─►26
1285+10/-105,131
DeepSeek
MIT
26
23◄─►26
1282+8/-810,859
Alibaba
Apache 2.0
27
27◄─►29
1258+15/-151,956
KwaiKAT
Proprietary
28
27◄─►30
1242+17/-171,537
OpenAI
Proprietary
29
27◄─►30
1236+10/-105,682
xAI
Proprietary
30
28◄─►33
1221+20/-201,039
Mistral
Apache 2.0
31
30◄─►33
1204+13/-133,454
Google
Proprietary
32
30◄─►33
1203+19/-191,266
xAI
Proprietary
33
30◄─►33
1201+16/-161,659
Mistral
Modified MIT
34
34◄─►35
1151+22/-22970
xAI
Proprietary
35
34◄─►36
1139+21/-211,017
xAI
Proprietary
36
35◄─►36
1097+22/-221,020
Mistral
Proprietary

Remove Style Control Leaderboard Plots

Fraction of Model A Wins for All Non-tied A vs. B Battles

Battle Count for Each Combination of Models (without Ties)

Confidence Intervals on Model Strength (via Bootstrapping)

Average Win Rate Against All Other Models (Uniform Sampling and No Ties)