Introducing Max
Today we are releasing Max, Arena's model router powered by our community’s 5+ million real-world votes. Max acts as an intelligent orchestrator—it routes each user prompt to the most capable model for that specific prompt.
Today we are releasing Max, Arena's model router powered by our community’s 5+ million real-world votes. Max acts as an intelligent orchestrator—it routes each user prompt to the most capable model for that specific prompt. Through this, Max achieves top performance across all domains.
In today’s rapidly advancing AI landscape, models and providers are evolving to fill different niches— some models are great at coding, others are strong in math; some answer quickly, while others think longer. Max intelligently leverages the varying strength profiles of different models to produce a unified experience that is reliable across the full usage spectrum.
We recently deployed a Max version in Battle mode, codenamed theta-hat, which achieved #1 on the Arena Overall leaderboard with a score of 1500. This base version of Max is also #1 across all major categories, including Coding, Math, and Expert.
We can also make a latency-aware version of Max, providing top-level performance while keeping response latency low. Our latest latency-aware Max, codenamed arcstride, achieved an Arena score of 1495 while also reducing time-to-first-token latency by more than 16 seconds compared to the next-best model.
Going forward, latency-aware Max will be our default experience in our Direct Chat mode. We plan on continually updating and improving Max over time. Max is now available at arena.ai/max.
Base Router Performance
Table 1: Max Arena Scores
Model Arena scores overall and across performance categories
| Category | Max (base) | gemini-3-pro | grok-4.1-thinking | gemini-3-flash | claude-opus-4-5-thinking-32k |
|---|---|---|---|---|---|
| Overall | 1500 | 1488 | 1476 | 1471 | 1468 |
| Expert | 1528 | 1501 | 1490 | 1494 | 1508 |
| Hard Prompts | 1527 | 1503 | 1487 | 1488 | 1501 |
| Coding | 1567 | 1519 | 1508 | 1505 | 1539 |
| Math | 1489 | 1485 | 1454 | 1470 | 1468 |
| Creative Writing | 1493 | 1491 | 1437 | 1462 | 1456 |
| Instruction Following | 1484 | 1472 | 1437 | 1453 | 1478 |
| Longer Query | 1503 | 1492 | 1449 | 1465 | 1494 |
Figure 1: Overall Routing Distribution
Top 10 models selected across all prompts for Max (base)
Figure 2: Category-wise Routing Distribution Comparison
Top 5 routed models by category shown as stacked percentage bars for Max (base)
Latency-Aware Router Performance
Table 2: Latency-Aware Router Arena Score vs. Latency
Performance and latency metrics in Battle mode
| Model | Score | TTFT (s) |
|---|---|---|
| Max (latency-aware) | 1495 | 3.44 |
| gemini-3-pro | 1488 | 19.72 |
| grok-4.1-thinking | 1476 | 7.19 |
| gemini-3-flash | 1471 | 5.83 |
| claude-opus-4-5-thinking-32k | 1468 | 11.58 |
Figure 3: Latency-Aware Router Provider Distribution
Model selection grouped by provider for Max (latency-aware)
In December 2025, we launched six experimental latency-aware versions of Max, with various tradeoffs between speed and performance. The majority of these models are on the Pareto frontier between latency and Arena score on the current leaderboard.
Figure 4: Arena Score vs. Time to First Token
Figure 5: Arena Score vs. End to End Generation Time
Say Hello to Max
Max provides our users with a single entrypoint where anyone can leverage the diverse skillsets of the latest cutting-edge LLMs in the most effective way possible. The latency-aware version we initially released balances both speed and performance, offering a smooth, reliable, and helpful chat experience.

Appendix
Benchmarking Performance
We also ran Max on a variety of relevant static benchmarks. Max was not explicitly optimized for top performance on these benchmarks, but was still able to achieve results on par with the current top models. Moreover, our latency-aware versions of Max also effectively handle latency tradeoffs on these benchmarks.
Table 3: Benchmark Performance
Accuracy scores across major benchmarks
| Benchmark | Max (base) | gemini-3-pro | claude-opus-4-5-thinking-32k | gpt-5.2-high |
|---|---|---|---|---|
| HLE | 38.1% | 38.7% | 26.6% | 31.1% |
| GPQA Diamond | 91.0% | 90.5% | 84.9% | 87.7% |
| SimpleQA Verified | 70.4% | 72.0% | 40.8% | 35.4% |
| MMLU-Pro | 89.7% | 89.8% | 89.5% | 87.4% |
| MMMLU | 91.8% | 91.8% | 90.8% | 89.6% |
| AIME 2025 | 95.3% | 95.7% | 91.3% | 99.0% |