谷歌发布 Gemma 4 QAT 检查点，支持消费级 GPU 和移动设备本地运行

Google AI Developers@googleaidevs

精选72

2026-06-06 00:57·9天前

精选理由

Gemma 4 的量化版把模型压到 1GB 以下，手机本地跑大模型的门槛又低了一大截。Google 这次没用传统的训练后量化，而是把压缩直接嵌进训练里，效果比 PTQ 好一截，搞端侧部署的可以拿 checkpoint 试起来了。

AI 摘要

谷歌发布 Gemma 4 量化感知训练 (QAT) 检查点，支持在消费级 GPU 和移动设备上本地运行，质量损失极小。新检查点提供 GGUF（Q4_0）格式，覆盖所有尺寸及起草模型，实现最佳本地性能。自定义移动模式采用混合精度方案，将 Gemma 4 压缩至 1GB 以下，包含 2-bit 解码层、优化 KV 缓存和静态激活。通过在训练中模拟压缩（而非训练后量化），大幅降低内存占用并加速解码，同时保持推理质量。

New @GoogleGemma 4 QAT （Quantization-Aware Training） checkpoints are here， so you can run models locally on consumer GPUs and mobile devices with minimal quality loss.

What's new：

🔹 GGUF （Q4_0）： Checkpoints： Max local performance across all sizes and drafter models 🔹 Custom Mobile Schema： We shrunk Gemma 4 down to less than 1GB for mobile devices by using a custom mixed precision schema designed for edge hardware （featuring targeted 2-bit decoding layers， optimized KV caches， and static activations）

By simulating compression during training rather than after （Post-Training Quantization）， we've drastically reduced the memory footprint and accelerated decode speeds while preserving reasoning quality. https：//blog.google/innovation-and-ai/technology/developers-tools/quantization-aware-training-gemma-4/

Google开源/仓库模型发布端侧

在 X 查看原推

Google AI Developers@googleaidevs · X