Generative Video

24papers
6.3viability
-15%30d

State of the Field

The field of generative video is currently focused on enhancing realism and interactivity in video synthesis, with recent advancements addressing critical challenges in human-object interaction and embodied intelligence. New frameworks are being developed to generate talking avatars that can interact with their environments based on text prompts, significantly improving the quality of grounded human-object interactions. Concurrently, benchmarks like RBench are being established to evaluate robotic video generation, highlighting deficiencies in physical realism and driving the creation of extensive annotated datasets to support model training. Additionally, counterfactual video generation techniques are being explored to mitigate hallucinations in video-language models, while innovations in memory-augmented video editing aim to maintain consistency across iterative edits. These developments not only enhance the fidelity and usability of generative video systems but also have potential applications in fields such as education, entertainment, and robotics, where high-quality, context-aware video content is increasingly in demand.

Last updated Feb 26, 2026

Papers

1–10 of 24
Research Paper·Jan 8, 2026

CounterVid: Counterfactual Video Generation for Mitigating Action and Temporal Hallucinations in Video-Language Models

Video-language models (VLMs) achieve strong multimodal understanding but remain prone to hallucinations, especially when reasoning about actions and temporal order. Existing mitigation strategies, suc...

8.0 viability
Research Paper·Jan 22, 2026

Memory-V2V: Augmenting Video-to-Video Diffusion Models with Memory

Recent foundational video-to-video diffusion models have achieved impressive results in editing user provided videos by modifying appearance, motion, or camera movement. However, real-world video edit...

8.0 viability
Research Paper·Feb 2, 2026

Making Avatars Interact: Towards Text-Driven Human-Object Interaction for Controllable Talking Avatars

Generating talking avatars is a fundamental task in video generation. Although existing methods can generate full-body talking avatars with simple human motion, extending this task to grounded human-o...

8.0 viability
Research Paper·Jan 21, 2026

Rethinking Video Generation Model for the Embodied World

Video generation models have significantly advanced embodied intelligence, unlocking new possibilities for generating diverse robot data that capture perception, reasoning, and action in the physical ...

8.0 viability
Research Paper·Mar 3, 2026

ShareVerse: Multi-Agent Consistent Video Generation for Shared World Modeling

This paper presents ShareVerse, a video generation framework enabling multi-agent shared world modeling, addressing the gap in existing works that lack support for unified shared world construction wi...

7.0 viability
Research Paper·Feb 3, 2026·Media & Entertainment

PrevizWhiz: Combining Rough 3D Scenes and 2D Video to Guide Generative Video Previsualization

In pre-production, filmmakers and 3D animation experts must rapidly prototype ideas to explore a film's possibilities before fullscale production, yet conventional approaches involve trade-offs in eff...

7.0 viability
Research Paper·Feb 3, 2026·Media & Entertainment

Morphe: High-Fidelity Generative Video Streaming with Vision Foundation Model

Video streaming is a fundamental Internet service, while the quality still cannot be guaranteed especially in poor network conditions such as bandwidth-constrained and remote areas. Existing works mai...

7.0 viability
Research Paper·Jan 26, 2026

PaperTok: Exploring the Use of Generative AI for Creating Short-form Videos for Research Communication

The dissemination of scholarly research is critical, yet researchers often lack the time and skills to create engaging content for popular media such as short-form videos. To address this gap, we expl...

7.0 viability
Research Paper·Feb 4, 2026·Media & EntertainmentConsumer

SkeletonGaussian: Editable 4D Generation through Gaussian Skeletonization

4D generation has made remarkable progress in synthesizing dynamic 3D objects from input text, images, or videos. However, existing methods often represent motion as an implicit deformation field, whi...

7.0 viability
Research Paper·Feb 11, 2026

Flow caching for autoregressive video generation

Autoregressive models, often built on Transformer architectures, represent a powerful paradigm for generating ultra-long videos by synthesizing content in sequential chunks. However, this sequential g...

7.0 viability
Page 1 of 3