TLDR
- Google’s Gemini 1.5 Pro experimental version scored 1300 on the LMSYS Chatbot Arena benchmark
- This score surpasses previous top performers OpenAI’s GPT-4o (1286) and Anthropic’s Claude-3 (1271)
- The new Gemini version was quietly released on August 1st as an experimental model
- Early user feedback on social media has been very positive
- It’s unclear if this experimental version will become the standard Gemini 1.5 Pro model
Google has quietly released an experimental version of its Gemini 1.5 Pro artificial intelligence model that has outperformed other leading AI systems on a key benchmark test.
The new version, released on August 1st, scored 1300 points on the LMSYS Chatbot Arena benchmark. This score puts it ahead of previous top performers OpenAI’s GPT-4o at 1286 points and Anthropic’s Claude-3 at 1271 points.
The LMSYS Chatbot Arena is a popular benchmark that tests AI models on various tasks and gives them an overall score. Higher scores suggest that a model is more capable across a range of abilities.
While benchmark tests don’t tell the full story of what an AI can do, they provide a way to compare different models.
This marks the first time that a Google AI model has taken the top spot on this particular leaderboard. The previous version of Gemini 1.5 Pro had scored 1261 points, so the new experimental version shows a notable improvement.
Google launched this update without much fanfare, simply making it available as an experimental release. However, it quickly gained attention in the AI community as people began to notice its strong performance.
Exciting News from Chatbot Arena!@GoogleDeepMind's new Gemini 1.5 Pro (Experimental 0801) has been tested in Arena for the past week, gathering over 12K community votes.
For the first time, Google Gemini has claimed the #1 spot, surpassing GPT-4o/Claude-3.5 with an impressive… https://t.co/SvjBegXbQ9 pic.twitter.com/6MTHdty1jb
— lmsys.org (@lmsysorg) August 1, 2024
Some users on social media have described the new Gemini version as “insanely good,” with one person on Reddit claiming it “blows [GPT-4o] out of the water.”
It’s important to note that this version of Gemini 1.5 Pro is still considered experimental. This means it may change before becoming widely available.
Google could potentially adjust or even remove this version for various reasons, including safety considerations or to better align it with their goals.
The strong performance of this new Gemini version highlights the ongoing competition in the field of generative AI.
For much of the past year, OpenAI’s GPT models and Anthropic’s Claude have been seen as the leaders in this space. Google’s latest achievement shows that the race to create more capable AI systems is still very active.
For users, this competition means there are increasingly capable AI options available. However, benchmarks don’t always reflect how well an AI will perform on specific real-world tasks.
It’s up to individual users to determine which AI works best for their particular needs.
As of now, it’s unclear whether Google plans to make this experimental version the standard release of Gemini 1.5 Pro. The company has not made any official announcements about their plans for this model.