View overall rankings across various AI models in text-to-text tasks across math, coding, creative writing, and other open-ended domains.
Lab Rank | Model Score | Rank Spread | ||
|---|---|---|---|---|
| 1 | Anthropic claude-opus-4-6-thinking · Proprietary | 1551±13 | 1 | 115 |
| 2 | OpenAI gpt-5.5 · Proprietary | 1537±17 | 5 | 126 |
| 3 | Moonshot kimi-k2.6 | 1533±18 | 6 | 132 |
| 4 | Google gemini-3-pro · Proprietary | 1529±14 | 7 | 132 |
| 5 | Z.ai glm-5 | 1528±18 | 8 | 141 |
| 6 | Alibaba qwen3.7-max-preview · Proprietary | 1524±38 | 12 | 171 |
| 7 | Meta muse-spark · Proprietary | 1523±22 | 14 | 156 |
| 8 | Xiaomi mimo-v2.5-pro | 1513±17 | 18 | 365 |
| 9 | xAI grok-4.20-beta1 · Proprietary | 1513±17 | 19 | 365 |
| 10 | DeepSeek deepseek-v4-pro | 1510±16 | 20 | 467 |
| 11 | Bytedance dola-seed-2.0-pro · Proprietary | 1504±13 | 26 | 670 |
| 12 | Baidu ernie-5.1 · Proprietary | 1500±18 | 33 | 675 |
| 13 | Meituan longcat-flash-chat-2602-exp · Proprietary | 1489±16 | 48 | 1392 |
| 14 | MiniMax minimax-m3 · Proprietary | 1475±27 | 67 | 13118 |
| 15 | Tencent hunyuan-hy3-preview | 1468±30 | 74 | 16127 |
| 16 | Mistral mistral-medium-3.5 | 1457±28 | 86 | 33134 |
| 17 | Nvidia nvidia-nemotron-3-ultra-550b-a55b-nvfp4 | 1449±32 | 94 | 40147 |
| 18 | StepFun step-3.5-flash | 1434±14 | 112 | 79141 |
| 19 | Amazon amazon-nova-experimental-chat-26-02-10 · Proprietary | 1432±39 | 116 | 49169 |
| 20 | Arcee AI trinity-large-thinking | 1402±16 | 141 | 112173 |
| 21 | Ant Group ling-flash-2.0 | 1386±33 | 155 | 112196 |
| 22 | Prime Intellect intellect-3 | 1382±34 | 160 | 113201 |
| 23 | 01 AI yi-lightning · Proprietary | 1362±12 | 176 | 148200 |
| 24 | IBM granite-4.1-8b | 1361±48 | 178 | 119229 |
| 25 | Cohere command-a-03-2025 | 1357±11 | 181 | 153202 |
| 26 | NexusFlow athene-v2-chat | 1333±14 | 197 | 175226 |
| 27 | Ai2 olmo-3.1-32b-instruct | 1327±27 | 202 | 167242 |
| 28 | Reka AI reka-core-20240904 · Proprietary | 1304±23 | 221 | 190255 |
| 29 | Princeton gemma-2-9b-it-simpo | 1289±21 | 237 | 204264 |
| 30 | AI21 Labs jamba-1.5-large | 1276±22 | 247 | 215273 |
| 31 | Microsoft phi-4 | 1267±16 | 254 | 224275 |
| 32 | InternLM internlm2_5-20b-chat | 1246±19 | 265 | 243281 |
| 33 | HuggingFace zephyr-orpo-141b-A35b-v0.1 | 1185±23 | 290 | 272307 |
| 34 | OpenChat openchat-3.5-0106 | 1178±18 | 292 | 277307 |
| 35 | Databricks dbrx-instruct-preview | 1177±13 | 293 | 282305 |
| 36 | Snowflake snowflake-arctic-instruct | 1176±14 | 294 | 282307 |
| 37 | Nexusflow starling-lm-7b-beta | 1174±15 | 297 | 282308 |
| 38 | Tsinghua chatglm3-6b | 1168±43 | 298 | 271320 |
| 39 | LMSYS vicuna-13b | 1146±20 | 307 | 289319 |
| 40 | UC Berkeley starling-lm-7b-alpha | 1136±22 | 310 | 296323 |
| 41 | NousResearch openhermes-2.5-mistral-7b | 1117±42 | 316 | 295329 |
| 42 | AllenAI/UW tulu-2-dpo-70b | 1102±36 | 318 | 301332 |
| 43 | MosaicML mpt-7b-chat | 1087±43 | 322 | 303333 |
| 44 | RWKV RWKV-4-Raven-14B | 1031±41 | 332 | 321335 |
| 45 | OpenAssistant oasst-pythia-12b | 939±35 | 336 | 333337 |
| 46 | Stanford alpaca-13b | 911±42 | 337 | 334337 |