三模态从零训练,Agent RL稳定性破局
Sources
- The Design Space of Tri-Modal Masked Diffusion Models
- ARLArena: A Unified Framework for Stable Agentic Reinforcement Learning
- SkyReels-V4: Multi-modal Video-Audio Generation, Inpainting and Editing model
- GUI-Libra: Training Native GUI Agents to Reason and Act with Action-aware Supervision and Partially Verifiable RL
- Solaris: Building a Multiplayer Video World Model in Minecraft