Latest

From Live Data to High-Quality Benchmarks - The Arena-Hard Pipeline

From Live Data to High-Quality Benchmarks - The Arena-Hard Pipeline

Authors Tianle Li* Wei-Lin Chiang* Evan Frick Lisa Dunlap Banghua Zhu Joseph E. Gonzalez Ion Stoica Building an affordable and reliable benchmark for LLM chatbots has become a critical challenge. A high-quality benchmark should 1) robustly separate model capability, 2) reflect human preference in real-world use cases, and 3) frequently

Arena Leaderboard Policy

Last Updated: March 6, 2026 Live and Community-Driven LLM Evaluation Transparency. The model evaluation and ranking pipelines have been open sourced in the arena-rank repository. We release a fraction of the data collected from the platform, as well. Together, this means that anyone can audit our leaderboard using publicly released

Chatbot Arena - New models & Elo system update

Chatbot Arena - New models & Elo system update

Authors Wei-Lin Chiang Tianle Li Joseph E. Gonzalez Ion Stoica Welcome to our latest update on the Chatbot Arena, our open evaluation platform to test the most advanced LLMs. We’re excited to share that over 130,000 votes that are now collected to rank the most capable 40+ models!

Chatbot Arena Conversation Dataset Release

Chatbot Arena Conversation Dataset Release

Since its launch three months ago, Chatbot Arena has become a widely cited LLM evaluation platform that emphasizes large-scale, community-based, and interactive human evaluation. In that short time span, we collected around 53K votes from 19K unique IP addresses for 22 models. In this blog post, we are releasing an

Chatbot Arena Leaderboard Updates (Week 8)

Chatbot Arena Leaderboard Updates (Week 8)

Introducing MT-Bench and Vicuna-33B

Chatbot Arena Leaderboard Updates (Week 4)

Chatbot Arena Leaderboard Updates (Week 4)

In this update, we are excited to welcome the following models joining the Chatbot Arena: 1. Google PaLM 2, chat-tuned with the code name chat-bison@001 on Google Cloud Vertex AI 2. Anthropic Claude-instant-v1 3. MosaicML MPT-7B-chat 4. Vicuna-7B A new Elo rating leaderboard based on the 27K anonymous voting

Chatbot Arena Leaderboard Updates (Week 2)

Chatbot Arena Leaderboard Updates (Week 2)

We release an updated leaderboard with more models and new data we collected last week, after the announcement of the anonymous Chatbot Arena. We are actively iterating on the design of the arena and leaderboard scores. In this update, we have added 4 new yet strong players into the Arena,