🚀推出UniRL,一个用于统一多模态模型的RL基础设施。附带两种新RL算法:DRPO和Flow-DPPO。 一个覆盖扩散/流匹配模型、LLM/VLM以及统一多模态模型的RL循环👇 代码:http://github.com/Tencent-Hunyuan/UniRL (是的——U(you)-ni-(need) RL 😉)
🚀Introducing UniRL, an RL infra for unified multimodal models. Together with two new RL algorithms: DRPO and Flow-DPPO.
One RL loop across diffusion/flow matching models, LLMs/VLMs, and unified multimodal models👇
Code: http://github.com/Tencent-Hunyuan/UniRL
(yes - U(you)-ni-(need) RL 😉)