Google 发布 Gemma 4 QAT 检查点，最小模型从 11.4GB 压缩至 1.1GB

Rohan Paul@rohanpaul_ai

2026-06-06 07:34·9天前

AI 摘要

Google 发布 Gemma 4 的 QAT（量化感知训练）检查点，将最小模型从 11.4GB 缩小至 1.1GB（纯文本版 0.84GB），便于手机和笔记本运行。常规 PTQ（训练后量化）因模型未学会应对舍入而损伤质量；QAT 在训练中模拟压缩，让模型在权重被挤压时学习，压缩版不易丢失推理能力。Google 还构建了移动端优化格式，包含静态激活、通道量化、定向 2-bit 量化及 KV 缓存优化，减少手机缩放计算并防止长对话过快消耗内存。

Google just made Gemma 4 much easier to run on phones and laptops by releasing QAT （Quantization-Aware Training） checkpoints that shrink the smallest model from 11.4GB to 1.1GB， or 0.84GB for text-only use.

Normal PTQ （Post-Training Quantization.） compresses after training and can damage quality because the model never learned to survive that rounding.

QAT fixes this by simulating compression during training， so Gemma 4 learns while its weights are being squeezed， making the final compressed model less likely to lose reasoning quality.

Google also built a mobile-focused format with static activations， channel-wise quantization， targeted 2-bit quantization， and KV cache optimization， which means the phone does less scaling work， stores some token-generation parts more aggressively， and keeps long chats from eating memory too fast.

Google开源/仓库模型发布端侧

在 X 查看原推

Rohan Paul@rohanpaul_ai · X

2026-06-06 07:34·9天前

AI 摘要

Normal PTQ （Post-Training Quantization.） compresses after training and can damage quality because the model never learned to survive that rounding.

QAT fixes this by simulating compression during training， so Gemma 4 learns while its weights are being squeezed， making the final compressed model less likely to lose reasoning quality.

Google开源/仓库模型发布端侧

在 X 查看原推x.com