Text Arena🏆Overall

View overall rankings across various AI models in text-to-text tasks across math, coding, creative writing, and other open-ended domains.

Apr 9, 2026

5,774,763 votes

57 labs

Lab Rank		Model Score		Rank Spread
1	Anthropic claude-opus-4-6-thinking · Proprietary	1504±5	1	13
2	Google gemini-3.1-pro-preview · Proprietary	1492±5	3	27
3	Meta muse-spark · Proprietary	1487±14	4	116
4	xAI grok-4.20-beta1 · Proprietary	1486±7	5	210
5	OpenAI gpt-5.4-high · Proprietary	1484±7	7	212
6	Z.ai glm-5.1	1471±8	13	622
7	Alibaba qwen3.5-max-preview · Proprietary	1466±7	17	724
8	Bytedance dola-seed-2.0-pro · Proprietary	1462±5	20	1427
9	Moonshot kimi-k2.5-thinking	1452±5	26	1939
10	Baidu ernie-5.0-0110 · Proprietary	1450±5	30	2239
11	Xiaomi mimo-v2-pro · Proprietary	1447±7	36	2245
12	Meituan longcat-flash-chat-2602-exp · Proprietary	1441±8	41	2552
13	Amazon amazon-nova-experimental-chat-26-02-10 · Proprietary	1427±10	52	3974
14	DeepSeek deepseek-v3.2-exp-thinking	1425±7	55	4474
15	Mistral mistral-large-3	1415±4	74	5882
16	MiniMax minimax-m2.7 · Proprietary	1404±7	85	68102
17	Tencent hunyuan-vision-1.5-thinking · Proprietary	1397±12	97	73118
18	Microsoft AI mai-1-preview · Proprietary	1393±5	102	87117
19	StepFun step-3.5-flash	1391±5	104	91118
20	Arcee AI trinity-large	1375±6	120	114129
21	Nvidia nvidia-nemotron-3-super-120b-a12b	1361±7	134	123152
22	Prime Intellect intellect-3	1356±8	139	127161
23	Cohere command-a-03-2025	1353±3	141	134159
24	Inception AI mercury-2 · Proprietary	1347±11	153	134174
25	Ant Group ling-flash-2.0	1346±7	154	136169
26	Zhipu glm-4-plus-0111 · Proprietary	1343±8	158	139175
27	Ai2 olmo-3.1-32b-instruct	1331±6	172	154186
28	01 AI yi-lightning · Proprietary	1328±5	173	159189
29	Zhipu AI glm-4-plus · Proprietary	1319±5	188	173201
30	NexusFlow athene-v2-chat	1314±5	196	179207
31	AI21 Labs jamba-1.5-large	1288±7	216	210228
32	Reka AI reka-core-20240904 · Proprietary	1287±7	218	210228
33	IBM ibm-granite-h-small	1287±8	219	210231
34	Princeton gemma-2-9b-it-simpo	1279±7	226	214234
35	Microsoft phi-4	1255±5	243	237245
36	HuggingFace zephyr-orpo-141b-A35b-v0.1	1212±11	265	255271
37	Databricks dbrx-instruct-preview	1194±6	273	267282
38	InternLM internlm2_5-20b-chat	1190±7	274	267286
39	OpenChat openchat-3.5-0106	1181±8	279	273293
40	Snowflake snowflake-arctic-instruct	1178±6	283	274293
41	AllenAI/UW tulu-2-dpo-70b	1177±10	285	274294
42	NousResearch openhermes-2.5-mistral-7b	1174±10	286	274297
43	LMSYS vicuna-33b	1172±6	287	276296
44	Nexusflow starling-lm-7b-beta	1170±7	288	276298
45	UC Berkeley starling-lm-7b-alpha	1166±8	291	277300
46	Upstage AI solar-10.7b-instruct-v1.0	1151±13	297	287313
47	Cognitive Computations dolphin-2.2.1-mistral-7b	1151±15	298	286315
48	MosaicML mpt-30b-chat	1149±12	299	291313
49	TII falcon-180b-chat	1146±17	302	291317
50	UW guanaco-33b	1126±12	314	299323
51	Together AI stripedhyena-nous-7b	1120±11	316	306323
52	Stanford alpaca-13b	1066±12	328	326331
53	Nomic AI gpt4all-13b-snoozy	1065±15	329	324332
54	Tsinghua chatglm3-6b	1055±12	331	326332
55	RWKV RWKV-4-Raven-14B	1040±11	332	329334
56	OpenAssistant oasst-pythia-12b	1021±11	334	332334
57	Stability AI stablelm-tuned-alpha-7b	951±13	339	338339

Text Arena🏆Overall

Default Leaderboard Plots

Fraction of Model A Wins for All Non-tied A vs. B Battles

Confidence Intervals on Model Strength (via Bootstrapping)

Battle Count for Each Combination of Models (without Ties)

Average Win Rate Against All Other Models (Uniform Sampling and No Ties)