Code Arena | React

View overall rankings across AI models on agentic coding tasks involving multi-step reasoning, tool use, and production-style workflows.

Feb 24, 2026
47,939 votes
29 models
Rank Spread
1
13
Anthropic
Anthropic · Proprietary
1547+14/-14
2,424
2
13
Anthropic
Anthropic · Proprietary
1539+16/-16
1,815
3
13
Anthropic
Anthropic · Proprietary
1521+17/-17
1,517
4
46
Anthropic
1485+12/-12
3,057
5
46
Anthropic
Anthropic · Proprietary
1470+12/-12
2,968
6
611
Z.ai · MIT
1438+14/-14
2,271
7
611
Google · Proprietary
1437+16/-16
1,604
8
611
Google · Proprietary
1432+11/-11
3,361
9
419
Z.ai · MIT
1426+57/-57
128
10
612
MoonshotAI
Moonshot · Modified MIT
1425+11/-11
3,272
11
613
Minimax
MiniMax · Modified MIT
1420+12/-12
3,122
12
917
MoonshotAI
Moonshot · Modified MIT
1404+13/-13
2,455
13
1018
Google · Proprietary
1400+12/-12
2,745
14
1118
Anthropic
Anthropic · Proprietary
1386+12/-12
2,828
15
1119
Qwen Icon
Alibaba · Apache 2.0
1384+13/-13
2,142
16
1119
Anthropic
1383+12/-12
2,671
17
1119
1380+11/-11
3,108
18
1219
Minimax
MiniMax · MIT
1377+13/-13
2,496
19
1420
DeepSeek · MIT
1359+13/-13
2,188
20
1921
DeepSeek · MIT
1342+13/-13
2,354
21
2025
OpenAI · Proprietary
1323+12/-12
2,821
22
2125
1314+12/-12
2,599
23
2125
Anthropic
Anthropic · Proprietary
1311+12/-12
2,707
24
2125
OpenAI · Proprietary
1310+12/-12
2,735
25
2125
MoonshotAI
Moonshot · Modified MIT
1309+12/-12
2,631
26
2627
Qwen Icon
Alibaba · Apache 2.0
1256+13/-13
2,671
27
2628
1237+20/-20
928
28
2728
xAI · Proprietary
1225+17/-17
1,468
29
2929
Mistral · Modified MIT
1159+42/-42
231

Remove Style Control Leaderboard Plots

Confidence Intervals on Model Strength (via Bootstrapping)

Battle Count for Each Combination of Models (without Ties)

Fraction of Model A Wins for All Non-tied A vs. B Battles

Average Win Rate Against All Other Models (Uniform Sampling and No Ties)