PDF Viewer

BUILDER'S SANDBOX

Build This Paper

Use an AI coding agent to implement this research.

OpenAI Codex
OpenAI CodexAI Agent

Lightweight coding agent in your terminal.

Claude Code
Claude CodeAI Agent

Agentic coding tool for terminal workflows.

AntiGravity IDE
AntiGravity IDEScaffolding

AI agent mindset installer and workflow scaffolder.

Cursor
CursorIDE

AI-first code editor built on VS Code.

VS Code
VS CodeIDE

Free, open-source editor by Microsoft.

MVP Investment

$9K - $12K
6-10 weeks
Engineering
$8,000
Cloud Hosting
$240
SaaS Stack
$300
Domain & Legal
$100

6mo ROI

2-4x

3yr ROI

10-20x

Lightweight AI tools can reach profitability quickly. At $500/mo average contract, 20 customers = $10K MRR by 6mo, 200+ by 3yr.

Talent Scout

J

Jiahui Fu

J

Junyu Nan

Carnegie Mellon University

L

Lingfeng Sun

Carnegie Mellon University

H

Hongyu Li

Brown University

Find Similar Experts

Robotic experts on LinkedIn & GitHub

References (36)

[1]
Evaluating Gemini Robotics Policies in a Veo World Simulator
2025G. Team, Google DeepMind
[2]
NovaFlow: Zero-Shot Manipulation via Actionable Flow from Generated Videos
2025Hongyu Li, Lingfeng Sun et al.
[3]
Video models are zero-shot learners and reasoners
2025Thaddaus Wiedemer, Yuxuan Li et al.
[4]
Video Generators are Robot Policies
2025Junbang Liang, P. Tokmakov et al.
[5]
GraspGen: A Diffusion-based Framework for 6-DOF Grasping with On-Generator Training
2025Adithyavairavan Murali, Balakumar Sundaralingam et al.
[6]
MoGe-2: Accurate Monocular Geometry with Metric Scale and Sharp Details
2025Ruicheng Wang, Sicheng Xu et al.
[7]
Robotic Manipulation by Imitating Generated Videos Without Physical Demonstrations
2025Shivansh Patel, Shraddhaa Mohan et al.
[8]
CodeDiffuser: Attention-Enhanced Diffusion Policy via VLM-Generated Code for Instruction Ambiguity
2025Guang Yin, Yitong Li et al.
[9]
Object-centric 3D Motion Field for Robot Learning from Human Videos
2025Zhao-Heng Yin, Sherry Yang et al.
[10]
Learning World Models for Interactive Video Generation
2025Taiye Chen, Xun-Feng Hu et al.
[11]
DreamGen: Unlocking Generalization in Robot Learning through Video World Models
2025J. Jang, Seonghyeon Ye et al.
[12]
π0.5: a Vision-Language-Action Model with Open-World Generalization
2025Physical Intelligence, Kevin Black et al.
[13]
Solving New Tasks by Adapting Internet Video Knowledge
2025Calvin Luo, Zilai Zeng et al.
[14]
TAPIP3D: Tracking Any Point in Persistent 3D Geometry
2025Bowei Zhang, Lei Ke et al.
[15]
Morpheus: Benchmarking Physical Reasoning of Video Generative Models with Real Physical Experiments
2025Chenyu Zhang, D. Cherniavskii et al.
[16]
Unified Video Action Model
2025Shuang Li, Yihuai Gao et al.
[17]
MegaSaM: Accurate, Fast, and Robust Structure and Motion from Casual Dynamic Videos
2024Zhengqi Li, Richard Tucker et al.
[18]
CoTracker3: Simpler and Better Point Tracking by Pseudo-Labelling Real Videos
2024Nikita Karaev, Iurii Makarov et al.
[19]
Gen2Act: Human Video Generation in Novel Scenarios enables Generalizable Robot Manipulation
2024Homanga Bharadhwaj, Debidatta Dwibedi et al.
[20]
ReKep: Spatio-Temporal Reasoning of Relational Keypoint Constraints for Robotic Manipulation
2024Wenlong Huang, Chen Wang et al.

Showing 20 of 36 references

Founder's Pitch

"NovaPlan enables robots to perform zero-shot, long-horizon manipulations using video language planning, achieving state-of-the-art results without prior demonstrations."

Robotic ManipulationScore: 8View PDF ↗

Commercial Viability Breakdown

0-10 scale

High Potential

2/4 signals

5

Quick Build

2/4 signals

5

Series A Potential

4/4 signals

10

Sources used for this analysis

arXiv Paper

Full-text PDF analysis of the research paper

GitHub Repository

Code availability, stars, and contributor activity

Citation Network

Semantic Scholar citations and co-citation patterns

Community Predictions

Crowd-sourced unicorn probability assessments

Analysis model: GPT-4o · Last scored: 2/23/2026

🔭 Research Neighborhood

Generating constellation...

~3-8 seconds

Why It Matters

This research enables robots to perform complex tasks without prior training or demonstrations, significantly reducing the costs and time associated with preparing robots for real-world applications. It enhances robot autonomy, which is crucial for deployments in dynamic and unstructured environments.

Product Angle

The product can be developed as a robotics software package or API that businesses can integrate with their existing automation systems to enhance flexibility and reduce setup costs. It could be bundled with a robot offering as a smart upgrade package.

Disruption

This technology could replace traditional robotics systems that require extensive programming and setup for new tasks, offering a more adaptable and efficient alternative.

Product Opportunity

The market size includes manufacturing sectors, logistics, and service robotics where flexibility in robot tasks is needed. Companies looking to minimize training times and costs would be potential customers, paying for licenses or subscriptions for this technology.

Use Case Idea

A commercial application could be an advanced robotics platform for automated assembly lines in industries that require custom, low-volume manufacturing where prior training for every configuration is unfeasible.

Science

NovaPlan combines vision-language models with video generation to plan and execute robot tasks. It breaks down tasks into sub-goals and uses a hybrid tracking mechanism to determine robot actions from generated videos. The framework continuously monitors and adjusts actions in response to execution failures.

Method & Eval

NovaPlan was tested on three long-horizon tasks and the Functional Manipulation Benchmark (FMB), demonstrating superior performance to existing zero-shot models in completing complex assembly tasks without prior training demos.

Caveats

The system's reliance on video models might encounter issues in environments with poor lighting or camera angles that obscure task details. Continuous updates and calibration would be needed to maintain performance across different settings.

Author Intelligence

Jiahui Fu

Junyu Nan

Carnegie Mellon University

Lingfeng Sun

Carnegie Mellon University

Hongyu Li

Brown University

Jianing Qian

University of Pennsylvania

Jennifer L. Barry

Carnegie Mellon University

Kris Kitani

Carnegie Mellon University

George Konidaris

Brown University