DVD: Deterministic Video Depth Estimation with Generative Priors

PDF Viewer

BUILDER'S SANDBOX

Build This Paper

Use an AI coding agent to implement this research.

OpenAI Codex
OpenAI CodexAI Agent

Lightweight coding agent in your terminal.

Claude Code
Claude CodeAI Agent

Agentic coding tool for terminal workflows.

AntiGravity IDE
AntiGravity IDEScaffolding

AI agent mindset installer and workflow scaffolder.

Cursor
CursorIDE

AI-first code editor built on VS Code.

VS Code
VS CodeIDE

Free, open-source editor by Microsoft.

MVP Investment

$9K - $12K
6-10 weeks
Engineering
$8,000
Cloud Hosting
$240
SaaS Stack
$300
Domain & Legal
$100

6mo ROI

2-4x

3yr ROI

10-20x

Lightweight AI tools can reach profitability quickly. At $500/mo average contract, 20 customers = $10K MRR by 6mo, 200+ by 3yr.

Talent Scout

H

Hongfei Zhang

HKUST(GZ)

H

Harold Haodong Chen

HKUST

C

Chenfei Liao

HKUST(GZ)

J

Jing He

HKUST(GZ)

Find Similar Experts

Video experts on LinkedIn & GitHub

References (74)

[1]
StableDPT: Temporal Stable Monocular Video Depth Estimation
2026Ivan Sobko, Hayko Riemenschneider et al.
[2]
Guided Diffusion-based Generation of Adversarial Objects for Real-World Monocular Depth Estimation Attacks
2025Yongtao Chen, Yanbo Wang et al.
[3]
WorldWarp: Propagating 3D Geometry with Asynchronous Video Diffusion
2025Hanyang Kong, Xingyi Yang et al.
[4]
Seedance 1.5 pro: A Native Audio-Visual Joint Generation Foundation Model
2025Siyan Chen, Yanfei Chen et al.
[5]
Video Depth Propagation
2025Luigi Piccinelli, Thiemo Wandel et al.
[6]
Lotus-2: Advancing Geometric Dense Prediction with Powerful Image Generative Model
2025Jing He, Haodong Li et al.
[7]
DualCamCtrl: Dual-Branch Diffusion Model for Geometry-Aware Camera-Controlled Video Generation
2025Hongfei Zhang, Kanghao Chen et al.
[8]
TiViBench: Benchmarking Think-in-Video Reasoning for Video Generative Models
2025Harold Haodong Chen, Disen Lan et al.
[9]
Hierarchical Fine-grained Preference Optimization for Physically Plausible Video Generation
2025Harold Haodong Chen, Haojian Huang et al.
[10]
StereoDiff: Stereo-Diffusion Synergy for Video Depth Estimation
2025Haodong Li, Chen Wang et al.
[11]
Seedance 1.0: Exploring the Boundaries of Video Generation Models
2025Yu Gao, Haoyuan Guo et al.
[12]
Neural Collapse in Cumulative Link Models for Ordinal Regression: An Analysis with Unconstrained Feature Model
2025Chuan Ma, Tomoyuki Obuchi et al.
[13]
Marigold: Affordable Adaptation of Diffusion-Based Image Generators for Image Analysis
2025Bingxin Ke, Kevin Qu et al.
[14]
SkyReels-V2: Infinite-length Film Generative Model
2025Guibin Chen, Dixuan Lin et al.
[15]
FlashDepth: Real-time Streaming Video Depth Estimation at 2K Resolution
2025Gene Chou, Wenqi Xian et al.
[16]
GeometryCrafter: Consistent Geometry Estimation for Open-world Videos with Diffusion Priors
2025Tianhan Xu, Xiangjun Gao et al.
[17]
Wan: Open and Advanced Large-Scale Video Generative Models
2025Ang Wang, Baole Ai et al.
[18]
Video Depth Anything: Consistent Depth Estimation for Super-Long Videos
2025Sili Chen, Hengkai Guo et al.
[19]
DepthMaster: Taming Diffusion Models for Monocular Depth Estimation
2025Ziyang Song, Zerong Wang et al.
[20]
HunyuanVideo: A Systematic Framework For Large Video Generative Models
2024Weijie Kong, Qi Tian et al.

Showing 20 of 74 references

Founder's Pitch

"DVD is a state-of-the-art deterministic video depth estimation tool leveraging generative priors for 3D scene understanding."

Video Depth EstimationScore: 8View PDF ↗

Commercial Viability Breakdown

0-10 scale

High Potential

3/4 signals

7.5

Quick Build

4/4 signals

10

Series A Potential

4/4 signals

10

Sources used for this analysis

arXiv Paper

Full-text PDF analysis of the research paper

GitHub Repository

Code availability, stars, and contributor activity

Citation Network

Semantic Scholar citations and co-citation patterns

Community Predictions

Crowd-sourced unicorn probability assessments

Analysis model: GPT-4o · Last scored: 3/12/2026

🔭 Research Neighborhood

Generating constellation...

~3-8 seconds

Why It Matters

This research addresses the critical issue in video depth estimation of balancing stability and detail without relying on large datasets, which is essential for advancing applications such as autonomous vehicles and robotics.

Product Angle

To productize this, a robust API providing deterministic depth estimation from video input could be developed for integration into robotics and automotive systems.

Disruption

The technology could replace both stochastic generative depth estimation models and annotation-heavy discriminative models, offering a more balanced and less resource-intensive solution.

Product Opportunity

The market for video depth estimation technology includes industries like autonomous vehicles and robotics, potentially reaching billions in size, with manufacturers and tech companies as primary buyers.

Use Case Idea

One specific application could be enhancing the depth perception in autonomous vehicles, allowing them to better understand and navigate real-world environments with fewer data requirements.

Science

The paper introduces DVD, which uses pretrained video diffusion models as deterministic depth regressors. It innovatively repurposes diffusion timesteps as anchors to balance stability with detail and employs latent manifold rectification to maintain temporal consistency in videos.

Method & Eval

The approach was validated through extensive experiments across multiple benchmarks, where it achieved superior zero-shot performance while using significantly less training data.

Caveats

Potential limitations include handling highly complex scenes where deterministic methods might miss nuanced details, and the dependency on the quality of pretrained diffusion models.

Author Intelligence

Hongfei Zhang

HKUST(GZ)

Harold Haodong Chen

HKUST

Chenfei Liao

HKUST(GZ)

Jing He

HKUST(GZ)

Zixin Zhang

HKUST(GZ)

Haodong Li

UCSD

Yihao Liang

Princeton University

Kanghao Chen

HKUST(GZ)

Bin Ren

MBZUAI

Xu Zheng

HKUST(GZ)

Shuai Yang

HKUST(GZ)

Kun Zhou

SZU

Yinchuan Li

Knowin

Nicu Sebe

UniTrento

Ying-Cong Chen

HKUST, HKUST(GZ)

Related Papers

Loading…