Research

Multimodal Max

Max, Arena's model router powered by 5M+ community votes, is now multimodal. Starting today, Max will be available as the default option in direct chat for all modalities, with expanded capabilities including search, vision, image generation, image editing, and front-end coding. Similar to our original Max for text, the multimodal variants are latency-controlled to provide a fast and performant experience. Try it now at arena.ai/max!

The benchmarks below show Max's performance across the Arena leaderboards most relevant to its capabilities. Because Max is a router, the models we compare Max to reflect a point-in-time snapshot of which models were publicly available and routable when Max was last trained and evaluated. Max is updated periodically to incorporate the latest frontier models.

Max holds Pareto frontier performance when compared to its routing set across every modality it covers. It outranks all other models in this set for every supported arena except Single-Image Edit and Multi-Image Edit, where it places second. In these two arenas, Max offers a large latency benefit over the top model.

Text Arena

Text Arena Score vs. Time to First Token

Text Max Routing Distribution

Top 5 models selected across all text modality prompts for Max

claude-opus-4-6 37.53%

claude-sonnet-4-6 19.21%

grok-4.1 12.16%

gemini-3-flash (thinking-minimal) 9.36%

gemini-3-flash 7.33%

Other 14.41%

Max demonstrates strong performance on text, improving the time-to-first-token by more than 9 seconds compared to the next best model. The routing distribution is diverse, suggesting Max’s ability to leverage the various strengths of different models.

Search Arena

Search Arena Score vs. Time to First Token

Search Max Routing Distribution

Top 5 models selected across all search modality prompts for Max

claude-opus-4-6-search 48.47%

gemini-3-flash-grounding 24.86%

claude-sonnet-4-6-search 21.76%

grok-4-fast-search 2.50%

gemini-3-pro-grounding 0.83%

Other 1.58%

In Search, we find similar results, with Max able to achieve top performance on the leaderboard. The routing distribution is more concentrated on fewer models which are both strong in performance and in latency.

Vision Arena

Vision Arena Score vs. Time to First Token

Vision Max Routing Distribution

Top 5 models selected across all vision modality prompts for Max

gpt-5.2-chat-latest 62.43%

gemini-3-flash (thinking-minimal) 12.12%

gpt-5.2-high 10.36%

gemini-3.1-pro-preview 6.15%

gemini-3-flash 4.62%

Other 4.32%

In Vision Arena, Max outperforms the best model at the time by 3 points while providing more than a 20 second speedup. The routing distribution shows Max strongly relying on gpt-5.2-chat-latest for 62% of battle prompts, but the 38% of prompts routed elsewhere netted Max a 12 point gain in strength.

Code Arena: Frontend

Code Arena Score vs. End to End Generation Time

Code Max Routing Distribution

Top 5 models selected across all code modality prompts for Max

claude-opus-4-6 40.80%

claude-opus-4-5-20251101-thinking-32k 26.13%

claude-opus-4-6-thinking 26.12%

gemini-3-flash 2.82%

gemini-3-pro 2.12%

Other 2.01%

In Code Arena, Max once again beats out its routing choices in terms of performance. Here we focused on making Max faster as measured by end-to-end latency since the output is only presented to the user upon full completion. Max leans more on claude-opus-4-5 variants than might be expected, largely due to gains in e2e latency.

Text-to-Image Arena

Text-to-Image Arena Score vs. End to End Generation Time

Text-to-Image Max Routing Distribution

Top 5 models selected across all text-to-image modality prompts for Max

gemini-3.1-flash-image-preview (nano-banana-2) 55.72%

gpt-image-1.5-high-fidelity 19.71%

flux-2-flex 9.88%

mai-image-2 7.99%

gemini-3-pro-image-preview-2k 2.27%

Other 4.43%

Max in the Text-to-Image modality had an extremely strong performance, outperforming the top models in its routing set both on model strength and latency. The routing distribution leaned towards gemini-3.1-flash-image-preview, but the remaining routing choices were diversely spread through multiple models.

Single-Image-Edit Arena

Image Edit Arena Score vs. End to End Generation Time

Image-Edit Max Routing Distribution

Top 5 models selected across all image-edit modality prompts for Max

gpt-image-2 (medium) 85.39%

gemini-3.1-flash-image-preview (nano-banana-2) [web-search] 11.09%

chatgpt-image-latest-high-fidelity (20251216) 1.59%

seedream-4.5 0.57%

wan2.7-image 0.50%

Other 0.86%

In the Single-Image Edit Arena, Max performed well, providing a faster but still strong alternative to gpt-image-2 (medium). Because the strength-latency tradeoff was configured to have a heavier emphasis on strength, Max still heavily relied on the top model. Interestingly, the routing distribution for Max in Image Edit Arena largely only consisted of other models on the pareto frontier, showing Max is able to identify models with the strongest latency/performance tradeoff.

Multi-Image-Edit Arena

Multi-Image Edit Arena Score vs. End to End Generation Time

Multi-Image-Edit Max Routing Distribution

Top 5 models selected across all multi-image-edit modality prompts for Max

gpt-image-2 (medium) 47.75%

gemini-3.1-flash-image-preview (nano-banana-2) 40.01%

wan2.7-image 8.07%

seedream-4.5 2.48%

gemini-3-pro-image-preview-2k (nano-banana-pro) 1.21%

Other 0.47%

Finally, on Multi-Image Edit Arena, Max also landed as a faster but still strong alternative to gpt-image-2 (medium). In this case, the strength-latency tradeoff was more heavily weighted towards speed, giving Max a 22 second speedup over gpt-image-2 (medium).

Test Max's New Abilities

With these expanded capabilities, Max can be used for more diverse tasks. Whether you want to generate a graphic, interpret a chart, or make a live website, Max can handle it. Go give the new and improved Max a try!