该论文提出SIA框架,让AI自动循环改进:一个观察者AI监控任务代理的表现,然后修改其外部设置(提示词、工具、重试规则、输出解析)或通过LoRA权重更新训练模型本身,模型主体不变,仅适配器从任务反馈中学习。在三个任务上测试:中文法律罪名分类(LawBench达70.1%)、GPU内核速度调优(生成代码优于此前最佳)、单细胞RNA降噪(得分0.289)。综合版本在所有任务上超越仅修改设置的方案,表明权重更新能帮助模型学到提示和工具无法发现的模式。
This paper shows an AI improving itself better when it rewrites its setup and updates its model.
The problem is that most AI progress still depends on people changing prompts, tools, code, training data, and model weights by hand.
The paper's idea is SIA, a loop where one AI watches how a task agent performs, then either changes the agent's outer setup or trains the model itself.
The outer setup means things like prompts, tools, retry rules, and output parsing, while weight updates mean changing the model's learned behavior through task feedback.
The loop works like this: the task agent tries many answers or programs, the verifier scores them, and those scores become training feedback.
Then the system updates a small add-on set of weights called LoRA weights, which changes the model's behavior without retraining the whole model.
So the base model stays mostly the same, but the LoRA adapter learns, "outputs like this got high reward, outputs like that failed."
The authors tested this on 3 very different tasks: Chinese legal charge classification, GPU kernel speed tuning, and single-cell RNA denoising.
The combined version beat setup-only improvement on all 3 tasks, reaching 70.1% on LawBench, faster GPU code than the prior best, and 0.289 on denoising.
The main lesson is that better scaffolding helps the agent act better, but weight updates help it learn task patterns that prompts and tools alone did not find.
----
Link - arxiv. org/abs/2605.27276
Title: "SIA: Self Improving AI with Harness & Weight Updates"