Code Arena | WebDev

Compare the performance of AI models on agentic coding tasks involving multi-step reasoning and tool use

Feb 19, 2026
163,881 votes
45 models
Rank Spread
1
12
Anthropic
Anthropic · Proprietary
1561+14/-14
2,524
2
13
Anthropic
Anthropic · Proprietary
1551+16/-16
1,919
3
24
Anthropic
Anthropic · Proprietary
1524+20/-20
1,021
4
34
Anthropic
1501+8/-8
10,620
5
59
OpenAI · Proprietary
1471+16/-16
1,695
6
58
Anthropic
Anthropic · Proprietary
1469+8/-8
10,730
7
513
Google · Proprietary
1461+15/-15
1,831
8
513
Z.ai · MIT
1456+14/-14
2,243
9
714
Google · Proprietary
1444+7/-7
16,673
10
614
Minimax
MiniMax · Modified MIT
1443+12/-12
3,260
11
714
Z.ai · MIT
1440+10/-10
5,128
12
714
Google · Proprietary
1440+8/-8
12,350
13
714
MoonshotAI
Moonshot · Modified MIT
1439+11/-11
3,564
14
914
MoonshotAI
Moonshot · Modified MIT
1424+13/-13
2,482
15
1521
Minimax
MiniMax · MIT
1402+8/-8
9,526
16
1521
1402+8/-8
8,383
17
1522
OpenAI · Proprietary
1395+16/-15
1,634
18
1522
OpenAI · Proprietary
1393+12/-12
3,928
19
1522
Anthropic
1390+7/-7
13,816
20
1522
Anthropic
Anthropic · Proprietary
1389+8/-8
8,985
21
1523
OpenAI · Proprietary
1388+9/-9
6,438
22
1723
Anthropic
Anthropic · Proprietary
1386+7/-7
15,469
23
2124
DeepSeek · MIT
1371+9/-9
5,714
24
2326
Z.ai · MIT
1356+8/-8
8,746
25
2429
OpenAI · Proprietary
1343+7/-7
12,764
26
2429
1342+8/-8
6,664
27
2530
OpenAI · Proprietary
1336+9/-9
5,380
28
2530
MoonshotAI
Moonshot · Modified MIT
1331+7/-7
12,256
29
2532
OpenAI · Proprietary
1329+9/-9
6,506
30
2733
DeepSeek · MIT
1319+9/-9
7,006
31
2933
Minimax
MiniMax · Apache 2.0
1312+9/-9
8,834
32
3033
Anthropic
Anthropic · Proprietary
1307+7/-7
13,542
33
2934
1306+13/-13
2,143
34
3335
DeepSeek · MIT
1286+10/-10
5,131
35
3435
Qwen Icon
Alibaba · Apache 2.0
1282+7/-7
13,266
36
3638
Kwai
KwaiKAT · Proprietary
1259+15/-15
1,954
37
3639
OpenAI · Proprietary
1243+17/-17
1,537
38
3639
xAI · Proprietary
1236+9/-9
7,129
39
3742
Mistral · Apache 2.0
1223+20/-20
1,039
40
3942
Google · Proprietary
1205+13/-13
3,454
41
3942
xAI · Proprietary
1204+19/-19
1,267
42
3942
Mistral · Modified MIT
1199+16/-16
1,683
43
4344
xAI · Proprietary
1153+22/-22
968
44
4345
xAI · Proprietary
1141+21/-21
1,017
45
4445
Mistral · Proprietary
1099+22/-22
1,020

Remove Style Control Leaderboard Plots

Fraction of Model A Wins for All Non-tied A vs. B Battles

Confidence Intervals on Model Strength (via Bootstrapping)

Battle Count for Each Combination of Models (without Ties)

Average Win Rate Against All Other Models (Uniform Sampling and No Ties)