微软提出SkillOpt方法,旨在改进AI智能体技能的优化过程。其核心思想是将一个独立的技能文档视为优化对象,而非直接修改底层大语言模型。该方法让智能体尝试任务,分析成功与失败案例,然后由一个更强的优化器模型对技能文档进行小幅编辑。编辑只会在提升验证集表现时被接受,从而确保技能的稳定改进。在6个基准测试、7个目标模型和3种智能体设置(包括直接聊天、Codex和Claude Code)的共52个测试案例中,SkillOpt均达到最佳或并列最佳。在GPT-5.5上,它将直接聊天的平均准确度提升了23.5点。最终产出的技能文件可读、可移植且可复用,部署时无需重新训练模型。
The problem is that agent skills are usually hand-written, made once by an LLM, or revised in loose ways that can easily make them worse.
SkillOpt from Microsoft, argues that agent skills should be trained like small external programs, it teaches AI agents better task habits by editing a reusable skill document, not the model itself.
The paper's core idea is to treat the skill document like the thing being trained, while the main AI model stays frozen and unchanged.
SkillOpt watches the agent try tasks, studies what worked and failed, then asks a stronger optimizer model to suggest small edits to the skill.
It only accepts an edit when the new skill improves on a held-out check set, so the skill does not drift just because an edit sounds good.
The authors tested this across 6 benchmarks, 7 target models, and 3 agent settings, including direct chat, Codex, and Claude Code.
SkillOpt was best or tied on all 52 tested cases, and on GPT-5.5 it raised average accuracy by 23.5 points in direct chat.
The final result is a small readable skill file that can improve agents across tasks and settings without retraining the model.
The best part is that the optimizer is used during training, but deployment only needs the final skill file.
That makes the artifact inspectable, portable, and cheap to reuse, which is exactly what most prompt-engineering systems lack.
----
Link - arxiv. org/abs/2605.23904
Title: "SkillOpt: Executive Strategy for Self-Evolving Agent Skills"