Text Arena🏆Overall

View overall rankings across various AI models in text-to-text tasks across math, coding, creative writing, and other open-ended domains.

Jun 5, 2026
6,703,075 votes
54 labs
Lab Rank
Model Score
Rank Spread
1
Anthropic
Anthropic
claude-opus-4-6-thinking · Proprietary
1504±4
1
13
2
Meta
Meta
muse-spark · Proprietary
1489±6
5
313
3
Google
gemini-3.1-pro-preview · Proprietary
1488±4
6
410
4
OpenAI
gpt-5.5-high · Proprietary
1482±5
9
522
5
Z.ai
glm-5.1
1475±6
14
829
6
xAI
grok-4.20-beta1 · Proprietary
1474±5
15
828
7
Alibaba
qwen3.7-max-preview · Proprietary
1474±10
17
631
8
Baidu
ernie-5.1 · Proprietary
1470±5
24
931
9
Xiaomi
mimo-v2.5-pro
1465±5
29
1336
10
Moonshot
kimi-k2.6
1461±5
30
2242
11
DeepSeek
deepseek-v4-pro-thinking
1458±5
34
2646
12
Bytedance
Bytedance
dola-seed-2.0-pro · Proprietary
1456±4
37
3048
13
MiniMax
minimax-m3 · Proprietary
1450±9
43
3061
14
Meituan
longcat-flash-chat-2602-exp · Proprietary
1435±5
64
5578
15
Amazon
amazon-nova-experimental-chat-26-02-10 · Proprietary
1427±10
73
57100
16
Mistral
mistral-medium-3.5
1425±8
77
60100
17
Nvidia
nvidia-nemotron-3-ultra-550b-a55b-nvfp4
1419±11
88
65113
18
Tencent
Tencent
hunyuan-hy3-preview
1415±8
100
73114
19
Stepfun
StepFun
step-3.5-flash
1395±4
125
111136
20
Arcee AI
trinity-large-preview
1378±4
143
134152
21
Prime Intellect
intellect-3
1356±8
164
152186
22
Cohere
Cohere
command-a-03-2025
1354±3
165
158184
23
Inception AI
mercury-2 · Proprietary
1346±11
177
158199
24
Ant Group
ling-flash-2.0
1346±7
178
161195
25
Ai2
olmo-3.1-32b-instruct
1330±6
197
181211
26
01.AI
01 AI
yi-lightning · Proprietary
1328±5
199
185216
27
NexusFlow
athene-v2-chat
1314±5
221
204234
28
IBM
granite-4.1-8b
1309±10
227
203239
29
AI21 Labs
jamba-1.5-large
1289±7
242
236254
30
Reka AI
reka-core-20240904 · Proprietary
1288±7
244
236254
31
Princeton
gemma-2-9b-it-simpo
1280±7
252
240260
32
Microsoft
phi-4
1256±5
269
263271
33
HuggingFace
zephyr-orpo-141b-A35b-v0.1
1212±11
291
281297
34
Databricks
dbrx-instruct-preview
1194±6
299
293308
35
InternLM
internlm2_5-20b-chat
1191±7
300
293312
36
OpenChat
openchat-3.5
1182±10
306
299319
37
Snowflake
snowflake-arctic-instruct
1179±6
309
300319
38
AllenAI/UW
tulu-2-dpo-70b
1177±10
311
300319
39
NousResearch
openhermes-2.5-mistral-7b
1174±10
312
300323
40
LMSYS
vicuna-33b
1172±6
313
302322
41
Nexusflow
starling-lm-7b-beta
1171±7
314
302325
42
UC Berkeley
starling-lm-7b-alpha
1167±8
317
304326
43
Upstage AI
solar-10.7b-instruct-v1.0
1151±13
323
313339
44
Cognitive Computations
dolphin-2.2.1-mistral-7b
1151±15
324
312341
45
MosaicML
mpt-30b-chat
1149±12
325
317339
46
TII
falcon-180b-chat
1146±17
328
316343
47
UW
guanaco-33b
1126±12
340
325348
48
Together AI
stripedhyena-nous-7b
1120±11
342
332349
49
Stanford
alpaca-13b
1067±11
354
352357
50
Nomic AI
gpt4all-13b-snoozy
1066±15
355
350358
51
Tsinghua
chatglm3-6b
1055±12
357
352358
52
RWKV
RWKV-4-Raven-14B
1041±11
358
355360
53
OpenAssistant
oasst-pythia-12b
1022±11
360
358360
54
Stability
Stability AI
stablelm-tuned-alpha-7b
952±13
365
364365

Default Leaderboard Plots

Confidence Intervals on Model Strength (via Bootstrapping)

Battle Count for Each Combination of Models (without Ties)

Fraction of Model A Wins for All Non-tied A vs. B Battles

Average Win Rate Against All Other Models (Uniform Sampling and No Ties)