Code Arena

Compare the performance of AI models on agentic coding tasks involving multi-step reasoning and tool use

Last Updated

Feb 1, 2026

Total Votes

125,171

Total Models

36

Rank Spread
1
1◄─►1
1500+9/-98,297
Anthropic
Proprietary
2
2◄─►5
1472+16/-161,689
OpenAI
Proprietary
3
2◄─►4
1470+9/-98,488
Anthropic
Proprietary
4
3◄─►7
1453+8/-814,466
Google
Proprietary
5
2◄─►7
1447+16/-161,488
Moonshot
Modified MIT
6
4◄─►7
1443+8/-89,926
Google
Proprietary
7
4◄─►7
1441+10/-105,117
Z.ai
MIT
8
8◄─►11
1409+9/-97,470
MiniMax
MIT
9
8◄─►15
1401+9/-95,941
Google
Proprietary
10
8◄─►16
1397+15/-151,633
OpenAI
Proprietary
11
8◄─►16
1394+12/-123,926
OpenAI
Proprietary
12
9◄─►16
1389+8/-88,979
Anthropic
Proprietary
13
9◄─►16
1389+9/-96,433
OpenAI
Proprietary
14
9◄─►16
1387+8/-811,664
Anthropic
Proprietary
15
9◄─►16
1386+7/-713,262
Anthropic
Proprietary
16
10◄─►16
1377+11/-113,995
DeepSeek
MIT
17
17◄─►19
1356+8/-88,736
Z.ai
MIT
18
17◄─►20
1350+8/-810,624
OpenAI
Proprietary
19
17◄─►22
1347+10/-104,518
Xiaomi
MIT
20
18◄─►22
1333+12/-123,235
OpenAI
Proprietary
21
19◄─►22
1330+8/-810,173
Moonshot
Modified MIT
22
19◄─►23
1329+9/-96,500
OpenAI
Proprietary
23
22◄─►25
Minimax
1312+9/-98,841
MiniMax
Apache 2.0
24
23◄─►26
1301+10/-105,081
DeepSeek
MIT
25
23◄─►27
1297+8/-811,416
Anthropic
Proprietary
26
24◄─►27
1287+10/-105,133
DeepSeek
MIT
27
25◄─►27
1283+7/-711,168
Alibaba
Apache 2.0
28
28◄─►30
1259+15/-151,956
KwaiKAT
Proprietary
29
28◄─►31
1243+17/-171,536
OpenAI
Proprietary
30
28◄─►31
1238+10/-105,824
xAI
Proprietary
31
29◄─►33
1223+20/-201,039
Mistral
Apache 2.0
32
31◄─►33
1206+13/-133,454
Google
Proprietary
33
31◄─►33
1205+19/-191,266
xAI
Proprietary
34
34◄─►35
1153+22/-22969
xAI
Proprietary
35
34◄─►36
1141+21/-211,016
xAI
Proprietary
36
35◄─►36
1098+22/-221,020
Mistral
Proprietary

Remove Style Control Leaderboard Plots

Fraction of Model A Wins for All Non-tied A vs. B Battles

Battle Count for Each Combination of Models (without Ties)

Confidence Intervals on Model Strength (via Bootstrapping)

Average Win Rate Against All Other Models (Uniform Sampling and No Ties)