腾讯混元联合人大开源PlanningBench评估框架

Tencent Hy@TencentHunyuan

精选74

2026-06-05 15:46·10天前

精选理由

腾讯混元联合人大开源的 PlanningBench，补上了 LLM 从「会说」到「会做」之间规划能力评估的缺口，做 Agent 的同学可以直接用来评测和训练，开源即用。

AI 摘要

腾讯混元（Tencent Hunyuan）与中国人民大学高瓴人工智能学院合作，开源PlanningBench——一个可扩展、可验证的LLM规划能力评估与训练框架。该框架包含30+真实世界规划任务，支持自动验证和训练。PlanningBench旨在推动LLM从“说”到“做”的规划能力发展。资源已发布于arXiv、GitHub及HuggingFace。

Planning is where LLMs move from "saying" to "doing."

Tencent Hy， in collaboration with the Gaoling School of Artificial Intelligence at Renmin University of China， is excited to open-source PlanningBench - a scalable， verifiable framework for evaluating and training LLM planning capabilities.

With PlanningBench， you get：

✅ 30+ real-world planning tasks ✅ Automated verification ✅ Evaluation and training support

See how top-tier LLMs perform on PlanningBench 👇

Resources： arXiv： https：//arxiv.org/abs/2605.20873 GitHub： https：//github.com/Tencent-Hunyuan/PlanningBench HuggingFace： https：//huggingface.co/datasets/tencent/PlanningBench

#PlanningBench #TencentHunyuan #OpenSource 📷

智能体arXivGitHub开源/仓库

在 X 查看原推

Tencent Hy@TencentHunyuan · X