速度智能兼得的新一代AI：谷歌Gemini 3.5 Flash发布

Artificial Analysis@ArtificialAnlys

2026-05-20 01:52·30天前

AI 摘要

谷歌发布新模型Gemini 3.5 Flash，其在智能指数上提升9分至55分，超越Grok 4.3和Claude Sonnet 4.6，尤其在代理任务和知识真实性（大幅减少幻觉）方面进步显著。输出速度超280 tokens/s，使其位于速度与智能的领先前沿。然而，模型运行成本相比前代增加5.5倍，主要由于输入令牌用量及定价上涨。此外，它在多模态评估MMMU-Pro中取得最高分，支持多模态输入，展现了谷歌的综合优势。

Google's new Gemini 3.5 Flash is the clear leader on the Intelligence vs Speed Pareto frontier and makes large gains on GDPval-AA （real-world agentic tasks）， but is 5x the cost of Gemini 3 Flash

@GoogleDeepMind gave us pre-release access to Gemini 3.5 Flash， the latest model in its Flash family， which has traditionally has offered faster， lower-cost alternatives to Gemini Pro models. Gemini 3.5 Flash scores 55 on the Artificial Analysis Intelligence Index， up 9 points from Gemini 3 Flash， driven primarily by agentic performance gains and hallucination reduction. It achieves speeds of over 280 output tokens/s， but higher token usage and token pricing make it over 5x more costly to run the Intelligence Index than Gemini 3 Flash， and 75% more costly than Gemini 3.1 Pro. Gemini 3.5 Flash is $1.50/1M input and $9/1M output tokens， Gemini 3 Flash was $0.5/$3 per 1M input/output tokens， a 3x increase. The rest of the increase was driven by higher token usage when running our benchmarks

Key results for Gemini 3.5 Flash with 'high' thinking level：

➤ 9 point Intelligence Index improvement： Gemini 3.5 Flash scores 55 on the Artificial Analysis Intelligence Index， up 9 points from Gemini 3 Flash. This places it ahead of Grok 4.3 （high， 53） and Claude Sonnet 4.6 （max， 52）. The model improves across nearly all evaluations， with the largest gains coming from agentic evaluations and AA-Omniscience （knowledge and hallucination）. On AA-Omniscience， Gemini 3.5 Flash improves by 11 points， driven primarily by reduced hallucinations， with its hallucination rate falling to 61%， a 31 point decrease compared to Gemini 3 Flash

➤ Agentic capability improvements： Gemini 3.5 Flash improves substantially over Gemini 3 Flash across our agentic evaluations， in both GDPval-AA （real-world agentic tasks） and Tau2-Bench Telecom （agentic tool use）. Its GDPval-AA result is especially notable， achieving an Elo of 1656， well ahead of Gemini 3 Flash （1204） and Gemini 3.1 Pro （1314）， and just behind GPT-5.4 （xhigh， 1674）. This represents a meaningful step forward for Google in agentic performance， which has historically been a relative weakness for Gemini models

➤ Speed-intelligence frontier： Gemini 3.5 Flash achieves speeds of over 280 output tokens per second， ~70% faster than Gemini 3 Flash and models such as gpt-oss-120b and GPT-5.4 mini （xhigh）. With its 55 Intelligence Index score， this places Gemini 3.5 Flash on the speed-intelligence Pareto frontier alongside Gemini 3.1 Pro and Gemini 3.1 Flash-Lite， reinforcing Google's strength in models balancing speed and intelligence

➤ 5.5x increase in cost to run： Gemini 3.5 Flash costs $1，552 to run the Artificial Analysis Intelligence Index， 5.5x more than Gemini 3 Flash and 75% more than Gemini 3.1 Pro. This is driven by increases in both token usage and token prices. Output token usage is broadly unchanged from Gemini 3 Flash （73M vs. 72M）， but input token usage increases significantly， driven primarily by an increase in the number of turns in agentic evaluations. Gemini 3.5 Flash is priced 3x higher than Gemini 3 Flash at $1.50/$9.00 per 1M input/output tokens， with a 90% discount for cached input tokens

➤ Google continues to lead multimodal performance： Gemini 3.5 Flash is multimodal， supporting image， video， and speech input alongside text. This differs from many proprietary models， including Claude Opus 4.7， Grok 4.3， and GPT-5.5， which support image input only. In our multimodal evaluation， MMMU-Pro， Gemini 3.5 Flash scores 84% - the highest score recorded. This puts models from Google in the top two spots， with Gemini 3.1 Pro scoring 82%

Key model details：

➤ Context window： Retains the same 1M context window as Gemini 3 Flash

➤ Multimodality： Text， image， video and speech input with text output only

➤ Pricing： $1.50/$9.00 per million input/output tokens， with a 90% discount for cached input tokens

Congratulations @GoogleDeepMind ， @sundarpichai and @demishassabis on the great release！

智能体DeepMindGoogle多模态

在 X 查看原推

Artificial Analysis@ArtificialAnlys · X

2026-05-20 01:52·30天前

AI 摘要

Key results for Gemini 3.5 Flash with 'high' thinking level：