Google DeepMind 发布开源权重模型 Gemma 4 12B,支持语音转录,在 AA-WER 基准上得分为 8.8%(排名第 58),远低于专注转录的开源模型 Voxtral Mini Transcribe 2(4B 参数,WER 3.6%)和 Voxtral Small(12B 参数,WER 2.8%)。该模型是 Gemma 4 系列中支持转录的最大型号(另有 E4B、E2B),而 31B 和 26B A4B 仅支持文本、图片和视频输入。Google 同步推出本地听写应用 Eloquent(MacOS/iOS)。模型已在 Hugging Face、Ollama 和 LMStudio 上架。
Google's newly released open weights model, Gemma 4 12B, supports transcription but is far from the frontier, scoring 8.8% on AA-WER (#58)
Gemma 4 12B is the latest release from @GoogleDeepMind in the Gemma 4 family. With a score of 8.8% on AA-WER, it is able to capture a reasonable amount of conversation context, but underperforms compared to transcription-focused open weights models like Voxtral Mini Transcribe 2 (3.6% WER, with 4B parameters) and slightly larger open weights language models like Voxtral Small (2.8% WER, with 12B parameters). The new model launched alongside their local dictation app, Eloquent, available on MacOS and iOS.
Gemma 4 12B is the largest in the Gemma 4 family to support transcription, alongside Gemma 4 E4B and Gemma 4 E2B, with Gemma 4 31B and Gemma 4 26B A4B supporting text, image and video input only. These models are available on a variety of platforms including Hugging Face, Ollama and LMStudio.
We are currently running Gemma 4 12B through the full Artificial Analysis Intelligence Index and will share results soon.