腾讯混元发布UniRL及两种新RL算法

Tencent Hy@TencentHunyuan

2026-06-09 20:03·6天前

AI 摘要

🚀推出UniRL，一个用于统一多模态模型的RL基础设施。附带两种新RL算法：DRPO和Flow-DPPO。一个覆盖扩散/流匹配模型、LLM/VLM以及统一多模态模型的RL循环👇 代码：http://github.com/Tencent-Hunyuan/UniRL （是的——U(you)-ni-(need) RL 😉）

🚀Introducing UniRL， an RL infra for unified multimodal models. Together with two new RL algorithms： DRPO and Flow-DPPO.

One RL loop across diffusion/flow matching models， LLMs/VLMs， and unified multimodal models👇

Code： http：//github.com/Tencent-Hunyuan/UniRL

（yes - U（you）-ni-（need） RL 😉）

GitHub多模态开源/仓库推理

在 X 查看原推

Tencent Hy@TencentHunyuan · X