OpenAI发布新一代旗舰语音模型GPT-Realtime-2。其在语音推理基准Big Bench Audio上取得96.6%的成绩,与Gemini 3.1 Flash持平,较此前最佳结果提升约13%。该模型同时在对话动态基准中保持领先,最小推理努力变体得分96.1%,尤其在停顿处理和轮转测试中表现突出。新模型支持从最小到xHigh的可调节推理努力等级,上下文窗口从32K增至128K,并支持文本、音频和图像输入,音频定价保持不变。
OpenAI has released GPT-Realtime-2, achieving 96.6% in our Speech Reasoning benchmark, Big Bench Audio, and #1 in our Conversational Dynamics benchmark
Released today, GPT-Realtime-2 is OpenAI's new flagship native Speech to Speech model, introducing adjustable reasoning effort levels from minimal through to xHigh. The high variant achieves a Big Bench Audio result of 96.6% equal to Gemini 3.1 Flash Live Preview - High. GPT-Realtime-2 continues to lead our Conversational Dynamics benchmark with the minimal variant achieving a score of 96.1%, showing particular strengths in our Pause Handling and Turn Taking tests.
The model supports short phrases before its main response, like "let me check that", as well as providing audible transparency while performing tool calls, like "checking your calendar". Additionally, the model context window has increased from 32K to 128K, enabling longer, more coherent sessions across complex task flows.
Key takeaways: ➤ Model's measured intelligence score on Big Bench Audio Speech to Speech reasoning benchmark of 96.6%, an increase of ~13% from previous highest result ➤ GPT-Realtime-2 is the leading model on Conversational Dynamics (Full Duplex Bench subset) benchmark with a score of 96.1% ➤ GPT-Realtime-2's average Time to First Audio on Big Bench Audio benchmark is 2.33 seconds on high reasoning and 1.12 seconds on minimal reasoning ➤ Audio pricing of model remains unchanged, with higher context window (128k tokens), higher max output tokens (32k), and support of text, audio and image input ➤ Model introduces adjustable reasoning effort levels minimal, low, medium, high, and xhigh, with low as the current default
See below for more detail ⬇️