LMArena is now Arena What began as a PhD research experiment to compare AI language models has grown over time into something broader, shaped by the people who use it.
Video Arena Is Live on Web Video Arena is now available at: lmarena.ai/video! What started last summer as a small Discord bot experiment has grown into something much more substantial. It quickly became clear that this wasn’t just a novelty for generating fun videos—it was a rigorous way to measure and understand
Fueling the World’s Most Trusted AI Evaluation Platform We’re excited to share a major milestone in LMArena’s journey. We’ve raised $150M of Series A funding led by Felicis and UC Investments (University of California), with participation from Andreessen Horowitz, The House Fund, LDVP, Kleiner Perkins, Lightspeed Venture Partners and Laude Ventures.
Arena-Rank: Open Sourcing the Leaderboard Methodology Building community trust with open science is critical for the development of AI and its alignment with the needs and preferences of all users. With that in focus, we’re delighted to publish Arena-Rank, an open-source Python package for ranking that powers the LMArena leaderboard!
Studying the Frontier: Arena Expert Arena Expert is a great way to differentiate between frontier models. In this analysis, we compare how models perform on 'general' vs 'expert' prompts, focusing on 'thinking' vs 'non-thinking' models.
LMArena's Ranking Method Since launching the platform, developing a rigorous and scientifically grounded evaluation methodology has been central to our mission. A key component of this effort is providing proper statistical uncertainty quantification for model scores and rankings. To that end, we have always reported confidence intervals alongside Arena scores and surfaced any
Arena Expert and Occupational Categories The next frontier of large language model (LLM) evaluation lies in understanding how models perform when challenged by expert-level problems, drawn from real work, across diverse disciplines.