View overall rankings across various AI models in text-to-text tasks across math, coding, creative writing, and other open-ended domains.
Lab Rank | Model Score | Rank Spread | ||
|---|---|---|---|---|
| 1 | Anthropic claude-fable-5 · Proprietary | 1564±18 | 1 | 17 |
| 2 | Alibaba qwen3.7-max-preview · Proprietary | 1526±18 | 10 | 345 |
| 3 | Meta muse-spark · Proprietary | 1525±10 | 11 | 637 |
| 4 | Z.ai glm-5.1 | 1525±9 | 12 | 636 |
| 5 | Google gemini-3.1-pro-preview · Proprietary | 1523±6 | 15 | 634 |
| 6 | OpenAI gpt-5.4-high · Proprietary | 1521±7 | 16 | 837 |
| 7 | Xiaomi mimo-v2.5-pro | 1519±8 | 18 | 841 |
| 8 | xAI grok-4.20-beta-0309-reasoning · Proprietary | 1513±6 | 24 | 1046 |
| 9 | Baidu ernie-5.1 · Proprietary | 1512±8 | 27 | 1048 |
| 10 | Moonshot kimi-k2.6 | 1512±8 | 28 | 1048 |
| 11 | Bytedance dola-seed-2.0-pro · Proprietary | 1510±6 | 31 | 1248 |
| 12 | MiniMax minimax-m3 · Proprietary | 1503±10 | 42 | 1664 |
| 13 | Meituan longcat-flash-chat-2602-exp · Proprietary | 1502±8 | 43 | 2064 |
| 14 | DeepSeek deepseek-v4-pro | 1501±7 | 44 | 2364 |
| 15 | Amazon amazon-nova-experimental-chat-26-02-10 · Proprietary | 1487±20 | 61 | 24101 |
| 16 | Mistral mistral-medium-3.5 | 1477±11 | 72 | 50105 |
| 17 | Nvidia nvidia-nemotron-3-ultra-550b-a55b-nvfp4 | 1468±14 | 84 | 59121 |
| 18 | Tencent hunyuan-hy3-preview | 1461±14 | 96 | 64129 |
| 19 | StepFun step-3.5-flash | 1450±6 | 113 | 89132 |
| 20 | Arcee AI trinity-large-preview | 1441±8 | 123 | 103148 |
| 21 | Ant Group ling-flash-2.0 | 1412±15 | 157 | 129182 |
| 22 | Prime Intellect intellect-3 | 1409±19 | 160 | 127197 |
| 23 | Inception AI mercury-2 · Proprietary | 1397±21 | 170 | 145204 |
| 24 | Cohere command-a-03-2025 | 1390±6 | 177 | 159197 |
| 25 | Ai2 olmo-3.1-32b-instruct | 1384±12 | 187 | 159209 |
| 26 | NexusFlow athene-v2-chat | 1369±9 | 198 | 176224 |
| 27 | 01 AI yi-lightning · Proprietary | 1369±10 | 199 | 176225 |
| 28 | IBM granite-4.1-8b | 1352±20 | 223 | 184246 |
| 29 | Reka AI reka-core-20240904 · Proprietary | 1315±15 | 248 | 233269 |
| 30 | AI21 Labs jamba-1.5-large | 1312±15 | 251 | 236271 |
| 31 | Microsoft phi-4 | 1307±10 | 258 | 241271 |
| 32 | Princeton gemma-2-9b-it-simpo | 1272±15 | 282 | 263299 |
| 33 | Databricks dbrx-instruct-preview | 1250±11 | 293 | 280307 |
| 34 | InternLM internlm2_5-20b-chat | 1248±14 | 296 | 279309 |
| 35 | HuggingFace zephyr-orpo-141b-A35b-v0.1 | 1244±21 | 298 | 274312 |
| 36 | Nexusflow starling-lm-7b-beta | 1235±13 | 304 | 289313 |
| 37 | OpenChat openchat-3.5-0106 | 1229±14 | 306 | 291317 |
| 38 | Snowflake snowflake-arctic-instruct | 1224±11 | 307 | 294317 |
| 39 | AllenAI/UW tulu-2-dpo-70b | 1214±21 | 310 | 294329 |
| 40 | UC Berkeley starling-lm-7b-alpha | 1206±16 | 313 | 303331 |
| 41 | LMSYS vicuna-33b | 1192±13 | 320 | 309334 |
| 42 | NousResearch openhermes-2.5-mistral-7b | 1186±23 | 323 | 308344 |
| 43 | Upstage AI solar-10.7b-instruct-v1.0 | 1183±27 | 325 | 308345 |
| 44 | MosaicML mpt-30b-chat | 1167±35 | 332 | 309349 |
| 45 | Together AI stripedhyena-nous-7b | 1126±22 | 348 | 332352 |
| 46 | UW guanaco-33b | 1113±36 | 349 | 334355 |
| 47 | Tsinghua chatglm3-6b | 1089±26 | 352 | 345357 |
| 48 | RWKV RWKV-4-Raven-14B | 1059±27 | 355 | 350359 |
| 49 | OpenAssistant oasst-pythia-12b | 1049±25 | 356 | 352360 |
| 50 | Stability AI stablelm-tuned-alpha-7b | 1004±33 | 359 | 354361 |
| 51 | Stanford alpaca-13b | 999±27 | 360 | 356361 |