LLM Leaderboard - Best Text & Chat AI Models Compared

Lab Rank		Model Score		Rank Spread
1	Anthropic claude-opus-4-6-thinking · Proprietary	1551±13	1	115
2	OpenAI gpt-5.5 · Proprietary	1537±17	5	126
3	Moonshot kimi-k2.6	1533±18	6	132
4	Google gemini-3-pro · Proprietary	1529±14	7	132
5	Z.ai glm-5	1528±18	8	141
6	Alibaba qwen3.7-max-preview · Proprietary	1524±38	12	171
7	Meta muse-spark · Proprietary	1523±22	14	156
8	Xiaomi mimo-v2.5-pro	1513±17	18	365
9	xAI grok-4.20-beta1 · Proprietary	1513±17	19	365
10	DeepSeek deepseek-v4-pro	1510±16	20	467
11	Bytedance dola-seed-2.0-pro · Proprietary	1504±13	26	670
12	Baidu ernie-5.1 · Proprietary	1500±18	33	675
13	Meituan longcat-flash-chat-2602-exp · Proprietary	1489±16	48	1392
14	MiniMax minimax-m3 · Proprietary	1475±27	67	13118
15	Tencent hunyuan-hy3-preview	1468±30	74	16127
16	Mistral mistral-medium-3.5	1457±28	86	33134
17	Nvidia nvidia-nemotron-3-ultra-550b-a55b-nvfp4	1449±32	94	40147
18	StepFun step-3.5-flash	1434±14	112	79141
19	Amazon amazon-nova-experimental-chat-26-02-10 · Proprietary	1432±39	116	49169
20	Arcee AI trinity-large-thinking	1402±16	141	112173
21	Ant Group ling-flash-2.0	1386±33	155	112196
22	Prime Intellect intellect-3	1382±34	160	113201
23	01 AI yi-lightning · Proprietary	1362±12	176	148200
24	IBM granite-4.1-8b	1361±48	178	119229
25	Cohere command-a-03-2025	1357±11	181	153202
26	NexusFlow athene-v2-chat	1333±14	197	175226
27	Ai2 olmo-3.1-32b-instruct	1327±27	202	167242
28	Reka AI reka-core-20240904 · Proprietary	1304±23	221	190255
29	Princeton gemma-2-9b-it-simpo	1289±21	237	204264
30	AI21 Labs jamba-1.5-large	1276±22	247	215273
31	Microsoft phi-4	1267±16	254	224275
32	InternLM internlm2_5-20b-chat	1246±19	265	243281
33	HuggingFace zephyr-orpo-141b-A35b-v0.1	1185±23	290	272307
34	OpenChat openchat-3.5-0106	1178±18	292	277307
35	Databricks dbrx-instruct-preview	1177±13	293	282305
36	Snowflake snowflake-arctic-instruct	1176±14	294	282307
37	Nexusflow starling-lm-7b-beta	1174±15	297	282308
38	Tsinghua chatglm3-6b	1168±43	298	271320
39	LMSYS vicuna-13b	1146±20	307	289319
40	UC Berkeley starling-lm-7b-alpha	1136±22	310	296323
41	NousResearch openhermes-2.5-mistral-7b	1117±42	316	295329
42	AllenAI/UW tulu-2-dpo-70b	1102±36	318	301332
43	MosaicML mpt-7b-chat	1087±43	322	303333
44	RWKV RWKV-4-Raven-14B	1031±41	332	321335
45	OpenAssistant oasst-pythia-12b	939±35	336	333337
46	Stanford alpaca-13b	911±42	337	334337

Lab Rank

Model Score

Rank Spread

Anthropic

claude-opus-4-6-thinking · Proprietary

1551±13

115

OpenAI

gpt-5.5 · Proprietary

1537±17

126

Moonshot

kimi-k2.6

1533±18

132

Google

gemini-3-pro · Proprietary

1529±14

132

Z.ai

glm-5

1528±18

141

Alibaba

qwen3.7-max-preview · Proprietary

1524±38

171

Meta

muse-spark · Proprietary

1523±22

156

Xiaomi

mimo-v2.5-pro

1513±17

365

xAI

grok-4.20-beta1 · Proprietary

1513±17

365

DeepSeek

deepseek-v4-pro

1510±16

467

Bytedance

dola-seed-2.0-pro · Proprietary

1504±13

670

Baidu

ernie-5.1 · Proprietary

1500±18

675

Meituan

longcat-flash-chat-2602-exp · Proprietary

1489±16

1392

MiniMax

minimax-m3 · Proprietary

1475±27

13118

Tencent

hunyuan-hy3-preview

1468±30

16127

Mistral

mistral-medium-3.5

1457±28

33134

Nvidia

nvidia-nemotron-3-ultra-550b-a55b-nvfp4

1449±32

40147

StepFun

step-3.5-flash

1434±14

112

79141

Amazon

amazon-nova-experimental-chat-26-02-10 · Proprietary

1432±39

116

49169

Arcee AI

trinity-large-thinking

1402±16

141

112173

Ant Group

ling-flash-2.0

1386±33

155

112196

Prime Intellect

intellect-3

1382±34

160

113201

01 AI

yi-lightning · Proprietary

1362±12

176

148200

IBM

granite-4.1-8b

1361±48

178

119229

Cohere

command-a-03-2025

1357±11

181

153202

NexusFlow

athene-v2-chat

1333±14

197

175226

Ai2

olmo-3.1-32b-instruct

1327±27

202

167242

Reka AI

reka-core-20240904 · Proprietary

1304±23

221

190255

Princeton

gemma-2-9b-it-simpo

1289±21

237

204264

AI21 Labs

jamba-1.5-large

1276±22

247

215273

Microsoft

phi-4

1267±16

254

224275

InternLM

internlm2_5-20b-chat

1246±19

265

243281

HuggingFace

zephyr-orpo-141b-A35b-v0.1

1185±23

290

272307

OpenChat

openchat-3.5-0106

1178±18

292

277307

Databricks

dbrx-instruct-preview

1177±13

293

282305

Snowflake

snowflake-arctic-instruct

1176±14

294

282307

Nexusflow

starling-lm-7b-beta

1174±15

297

282308

Tsinghua

chatglm3-6b

1168±43

298

271320

LMSYS

vicuna-13b

1146±20

307

289319

UC Berkeley

starling-lm-7b-alpha

1136±22

310

296323

NousResearch

openhermes-2.5-mistral-7b

1117±42

316

295329

AllenAI/UW

tulu-2-dpo-70b

1102±36

318

301332

MosaicML

mpt-7b-chat

1087±43

322

303333

RWKV

RWKV-4-Raven-14B

1031±41

332

321335

OpenAssistant

oasst-pythia-12b

939±35

336

333337

Stanford

alpaca-13b

911±42

337

334337

Text Arena🇨🇳Chinese

Default Leaderboard Plots

Battle Count for Each Combination of Models (without Ties)

Fraction of Model A Wins for All Non-tied A vs. B Battles

Confidence Intervals on Model Strength (via Bootstrapping)

Average Win Rate Against All Other Models (Uniform Sampling and No Ties)