RoboMemArena:一个全面且具有挑战性的机器人记忆基准
研究团队推出机器人记忆基准 RoboMemArena,包含26个长轨迹任务,平均轨迹长度超1,000步,其中68.9%的子任务依赖记忆。该基准利用视觉语言模型生成子任务与轨迹,并提供记忆相关标注,同时配备真实世界任务以支持物理评估。团队进一步提出 PrediMem 双系统架构,通过高层VLM规划器管理包含近期与关键帧缓冲的记忆库,并利用预测编码头提升对任务动态的敏感性。实验表明 PrediMem 在基准上优于所有基线模型,为复杂记忆系统的设计提供了新见解。
Memory is a critical component of robotic intelligence, as robots must rely on past observations and actions to accomplish long-horizon tasks in partially observable environments. However, existing robotic memory benchmarks still lack multimodal annotations for memory formation, provide limited task coverage and structural complexity, and remain restricted to simulation without real-world evaluation. We address this gap with RoboMemArena, a large-scale benchmark of 26 tasks, with average trajectory lengths exceeding 1,000 steps per task and 68.9% of subtasks being memory-dependent. The generation pipeline leverages a vision-language model (VLM) to design and compose subtasks, generates full trajectories through atomic functions, and provides memory-related annotations, including subtask instructions and native keyframe annotations, while paired real-world memory tasks support physical evaluation. We further design PrediMem, a dual-system VLA in which a high-level VLM planner manages a memory bank with recent and keyframe buffers and uses a predictive coding head to improve sensitivity to task dynamics. Extensive experiments on RoboMemArena show that PrediMem outperforms all baselines and provides insights into memory management, model architecture, and scaling laws for complex memory systems.