BalCapRL：基于强化学习的MLLM图像描述平衡框架

2026-05-11 08:00·34天前

AI 摘要

研究团队针对多模态大语言模型图像描述任务提出BalCapRL平衡框架。该框架通过设计多维度奖励函数，系统解决了现有强化学习方法在追求描述效用时引发的幻觉、噪声和冗长等问题。实验表明，BalCapRL在保持描述准确性的同时，显著提升了信息密度与可读性，在多个基准测试中实现了更均衡的性能表现，有效突破了传统方法在核心维度间的权衡局限。

原文 · 未翻译

Image captioning is one of the most fundamental tasks in computer vision. Owing to its open-ended nature, it has received significant attention in the era of multimodal large language models (MLLMs). In pursuit of ever more detailed and accurate captions, recent work has increasingly turned to reinforcement learning (RL). However, existing captioning-RL methods and evaluation metrics often emphasize a narrow notion of caption quality, inducing trade-offs across core dimensions of captioning. For example, utility-oriented objectives can encourage noisy, hallucinated, or overlong captions that…

多模态论文/研究

Apple Machine Learning Research（RSS）

BalCapRL：基于强化学习的MLLM图像描述平衡框架

2026-05-11 08:00·34天前

AI 摘要

原文 · 保持原样，未翻译

多模态论文/研究

阅读原文machinelearning.apple.com

BalCapRL： 基于强化学习的MLLM图像描述平衡框架

BalCapRL： 基于强化学习的MLLM图像描述平衡框架

BalCapRL：基于强化学习的MLLM图像描述平衡框架

BalCapRL：基于强化学习的MLLM图像描述平衡框架