Vision Arena🏆Overall
View overall rankings across multimodal AI models capable of reasoning over visual inputs.
Apr 10, 2026
756,611 votes
36 open source models
Rank Spread | |||||||
|---|---|---|---|---|---|---|---|
| 1 | 13 | Moonshot · Modified MIT | 1247±9 | 923 | 8,513 | $0.60 / $3 | N/A |
| 2 | 17 | Alibaba · Apache 2.0 | 1242±9 | 927 | 7,319 | $0.39 / $2.34 | 262.1K |
| 3 | 19 | Moonshot · Modified MIT | 1238±11 | 1029 | 3,947 | $0.38 / $1.72 | 262.1K |
| 4 | 24 | Alibaba · Apache 2.0 | 1227±10 | 1536 | 4,916 | $0.20 / $1.56 | 262.1K |
| 5 | 28 | Alibaba · Apache 2.0 | 1224±10 | 1736 | 5,335 | $0.26 / $2.08 | 262.1K |
| 6 | 31 | Alibaba · Apache 2.0 | 1214±7 | 2339 | 12,382 | $0.20 / $0.88 | 262.1K |
| 7 | 42 | Alibaba · Apache 2.0 | 1190±12 | 3552 | 2,401 | $0.26 / $2.60 | 131.1K |
| 8 | 54 | Z.ai · MIT | 1163±14 | 4469 | 2,362 | $0.30 / $0.90 | 131.1K |
| 9 | 59 | Google · Gemma | 1158±8 | 5269 | 17,777 | $0.08 / $0.16 | 131.1K |
| 10 | 62 | Z.ai · MIT | 1155±12 | 5170 | 3,390 | $0.60 / $1.80 | 65.5K |
| 11 | 64 | Meta · Llama 4 | 1147±9 | 5472 | 7,147 | $0.63 / $1.80 | 131.1K |
| 12 | 67 | StepFun · Apache 2.0 | 1145±12 | 5475 | 3,361 | $0.57 / $1.42 | 65.5K |
| 13 | 68 | Mistral · Apache 2.0 | 1140±9 | 5776 | 11,363 | $0.10 / $0.30 | 32K |
| 14 | 71 | Mistral · Apache 2.0 | 1127±9 | 6480 | 29,989 | $0.10 / $0.30 | 32K |
| 15 | 72 | Meta · Llama | 1127±10 | 6480 | 6,632 | $0.40 / $0.70 | 8.2K |
| 16 | 75 | Alibaba · Qwen | 1122±11 | 6780 | 3,768 | $0.80 / $0.80 | 32.8K |
| 17 | 76 | Alibaba · Apache 2.0 | 1120±15 | 6582 | 1,490 | $0.20 / $0.60 | 128K |
| 18 | 80 | Ai2 · Apache 2.0 | 1106±20 | 7085 | 1,187 | $0.20 / $0.20 | 36.9K |
| 19 | 82 | Mistral · MRL | 1095±9 | 7985 | 5,423 | $2 / $6 | 131.1K |
| 20 | 85 | Alibaba · Qwen | 1085±10 | 8087 | 5,937 | $0.90 / $0.90 | 32.8K |
| 21 | 90 | Ai2 · Apache 2.0 | 1046±13 | 8796 | 3,048 | N/A | N/A |
| 22 | 92 | Meta · Llama 3.2 | 1032±9 | 9098 | 8,682 | N/A | N/A |
| 23 | 93 | Aliaba · Apache 2.0 | 1031±10 | 9099 | 5,766 | $0.20 / $0.20 | 32.8K |
| 24 | 94 | Mistral · Apache 2.0 | 1026±9 | 90100 | 7,511 | $0.15 / $0.15 | 128K |
| 25 | 95 | OpenGVLab · MIT | 1025±13 | 90101 | 5,148 | N/A | N/A |
| 26 | 101 | Cohere · CC-BY-NC-4.0 | 1000±22 | 92107 | 847 | N/A | N/A |
| 27 | 102 | Ai2 · Apache 2.0 | 995±14 | 96105 | 2,815 | N/A | N/A |
| 28 | 103 | Meta · Llama 3.2 | 992±11 | 99105 | 4,817 | N/A | N/A |
| 29 | 104 | Nvidia · - | 987±20 | 97109 | 1,077 | N/A | N/A |
| 30 | 105 | LLaVA · Apache 2.0 | 979±18 | 99109 | 1,321 | N/A | N/A |
| 31 | 106 | LLaVA · Apache 2.0 | 965±12 | 104109 | 4,531 | N/A | N/A |
| 32 | 107 | Zhipu AI · CogVLM2 | 964±15 | 103109 | 1,991 | N/A | N/A |
| 33 | 108 | OpenBMB · Apache 2.0 | 964±16 | 103109 | 1,987 | N/A | N/A |
| 34 | 109 | OpenGVLab · MIT | 958±12 | 104109 | 3,703 | N/A | N/A |
| 35 | 110 | Microsoft · MIT | 921±16 | 110110 | 2,592 | $0.13 / $0.52 | 128K |
| 36 | 111 | Microsoft · MIT | 882±19 | 111111 | 1,401 | $0.20 / $0.20 | 32.1K |