UniVA: Universal Video Agents towards Next-Generation Video Intelligence

Anonymous Submission
Paper

「UniVA」Universal Video Agents

We introduce UniVA, an agentic framework that achieves Breadth by unifying a vast suite of video tools on a single platform, and Depth through its core innovation of "Agentic Synergy." This synergy is enabled by a dual-agent, memory-augmented architecture that dynamically manages information flow, allowing tools like Understanding to actively guide Editing and Segmentation. By transforming isolated functions into a seamlessly collaborative workflow, UniVA solves complex planning and consistency problems that are intractable for single models.

An unparalleled creative experience

Video Gallery

Technical Details

Teaser Figure

Revolutionary User Experience

UniVA delivers unprecedented user experience and comprehensive, industrial-grade production power through its innovative agentic framework.

Pipeline Figure

Dual-Agent Architecture

Plan Agent decomposes user input into subtasks using global and user memory, while Act Agent executes via MCP protocol to generate versatile multimodal outputs.

Task Figure

Synergistic Components

UniVA's components work in synergy, revealing both depth in handling complex autonomous tasks and breadth in supporting interactive multi-tool creation.

Task Figure

Memory Mechanism

We build a three-level Memory Mechanism that dynamically manages information flow, allowing tools like Understanding to actively guide Editing and Segmentation.