Text Arena🏆Overall

View overall rankings across various AI models in text-to-text tasks across math, coding, creative writing, and other open-ended domains.

Apr 9, 2026
5,774,763 votes
57 labs
Lab Rank
Model Score
Rank Spread
1
Anthropic
Anthropic
claude-opus-4-6-thinking · Proprietary
1504±5
1
13
2
Google
gemini-3.1-pro-preview · Proprietary
1492±5
3
27
3
Meta
Meta
muse-spark · Proprietary
1487±14
4
116
4
xAI
grok-4.20-beta1 · Proprietary
1486±7
5
210
5
OpenAI
gpt-5.4-high · Proprietary
1484±7
7
212
6
Z.ai
glm-5.1
1471±8
13
622
7
Alibaba
qwen3.5-max-preview · Proprietary
1466±7
17
724
8
Bytedance
Bytedance
dola-seed-2.0-pro · Proprietary
1462±5
20
1427
9
Moonshot
kimi-k2.5-thinking
1452±5
26
1939
10
Baidu
ernie-5.0-0110 · Proprietary
1450±5
30
2239
11
Xiaomi
mimo-v2-pro · Proprietary
1447±7
36
2245
12
Meituan
longcat-flash-chat-2602-exp · Proprietary
1441±8
41
2552
13
Amazon
amazon-nova-experimental-chat-26-02-10 · Proprietary
1427±10
52
3974
14
DeepSeek
deepseek-v3.2-exp-thinking
1425±7
55
4474
15
Mistral
mistral-large-3
1415±4
74
5882
16
MiniMax
minimax-m2.7 · Proprietary
1404±7
85
68102
17
Tencent
Tencent
hunyuan-vision-1.5-thinking · Proprietary
1397±12
97
73118
18
Microsoft AI
mai-1-preview · Proprietary
1393±5
102
87117
19
Stepfun
StepFun
step-3.5-flash
1391±5
104
91118
20
Arcee AI
trinity-large
1375±6
120
114129
21
Nvidia
nvidia-nemotron-3-super-120b-a12b
1361±7
134
123152
22
Prime Intellect
intellect-3
1356±8
139
127161
23
Cohere
Cohere
command-a-03-2025
1353±3
141
134159
24
Inception AI
mercury-2 · Proprietary
1347±11
153
134174
25
Ant Group
ling-flash-2.0
1346±7
154
136169
26
Zhipu
glm-4-plus-0111 · Proprietary
1343±8
158
139175
27
Ai2
olmo-3.1-32b-instruct
1331±6
172
154186
28
01.AI
01 AI
yi-lightning · Proprietary
1328±5
173
159189
29
Zhipu AI
glm-4-plus · Proprietary
1319±5
188
173201
30
NexusFlow
athene-v2-chat
1314±5
196
179207
31
AI21 Labs
jamba-1.5-large
1288±7
216
210228
32
Reka AI
reka-core-20240904 · Proprietary
1287±7
218
210228
33
IBM
ibm-granite-h-small
1287±8
219
210231
34
Princeton
gemma-2-9b-it-simpo
1279±7
226
214234
35
Microsoft
phi-4
1255±5
243
237245
36
HuggingFace
zephyr-orpo-141b-A35b-v0.1
1212±11
265
255271
37
Databricks
dbrx-instruct-preview
1194±6
273
267282
38
InternLM
internlm2_5-20b-chat
1190±7
274
267286
39
OpenChat
openchat-3.5-0106
1181±8
279
273293
40
Snowflake
snowflake-arctic-instruct
1178±6
283
274293
41
AllenAI/UW
tulu-2-dpo-70b
1177±10
285
274294
42
NousResearch
openhermes-2.5-mistral-7b
1174±10
286
274297
43
LMSYS
vicuna-33b
1172±6
287
276296
44
Nexusflow
starling-lm-7b-beta
1170±7
288
276298
45
UC Berkeley
starling-lm-7b-alpha
1166±8
291
277300
46
Upstage AI
solar-10.7b-instruct-v1.0
1151±13
297
287313
47
Cognitive Computations
dolphin-2.2.1-mistral-7b
1151±15
298
286315
48
MosaicML
mpt-30b-chat
1149±12
299
291313
49
TII
falcon-180b-chat
1146±17
302
291317
50
UW
guanaco-33b
1126±12
314
299323
51
Together AI
stripedhyena-nous-7b
1120±11
316
306323
52
Stanford
alpaca-13b
1066±12
328
326331
53
Nomic AI
gpt4all-13b-snoozy
1065±15
329
324332
54
Tsinghua
chatglm3-6b
1055±12
331
326332
55
RWKV
RWKV-4-Raven-14B
1040±11
332
329334
56
OpenAssistant
oasst-pythia-12b
1021±11
334
332334
57
Stability
Stability AI
stablelm-tuned-alpha-7b
951±13
339
338339

Default Leaderboard Plots

Fraction of Model A Wins for All Non-tied A vs. B Battles

Confidence Intervals on Model Strength (via Bootstrapping)

Battle Count for Each Combination of Models (without Ties)

Average Win Rate Against All Other Models (Uniform Sampling and No Ties)