$Ψ_0$: An Open Foundation Model Towards Universal Humanoid Loco-Manipulation

PDF Viewer

BUILDER'S SANDBOX

Build This Paper

Use an AI coding agent to implement this research.

OpenAI Codex
OpenAI CodexAI Agent

Lightweight coding agent in your terminal.

Claude Code
Claude CodeAI Agent

Agentic coding tool for terminal workflows.

AntiGravity IDE
AntiGravity IDEScaffolding

AI agent mindset installer and workflow scaffolder.

Cursor
CursorIDE

AI-first code editor built on VS Code.

VS Code
VS CodeIDE

Free, open-source editor by Microsoft.

MVP Investment

$9K - $12K
6-10 weeks
Engineering
$8,000
Cloud Hosting
$240
SaaS Stack
$300
Domain & Legal
$100

6mo ROI

2-4x

3yr ROI

10-20x

Lightweight AI tools can reach profitability quickly. At $500/mo average contract, 20 customers = $10K MRR by 6mo, 200+ by 3yr.

Talent Scout

S

Songlin Wei

USC Physical Superintelligence (PSI) Lab

H

Hongyi Jing

USC Physical Superintelligence (PSI) Lab

B

Boqian Li

USC Physical Superintelligence (PSI) Lab

Z

Zhenyu Zhao

USC Physical Superintelligence (PSI) Lab

Find Similar Experts

Humanoid experts on LinkedIn & GitHub

References (45)

[1]
Coordinated Humanoid Manipulation with Choice Policies
2025Haozhi Qi, Yen-Jen Wang et al.
[2]
WholeBodyVLA: Towards Unified Latent VLA for Whole-Body Loco-Manipulation Control
2025Haoran Jiang, Jin Chen et al.
[3]
Training-Time Action Conditioning for Efficient Real-Time Chunking
2025Kevin Black, Allen Z. Ren et al.
[4]
Qwen3-VL Technical Report
2025Shuai Bai, Yuxuan Cai et al.
[5]
In-N-On: Scaling Egocentric Manipulation with in-the-wild and on-task Data
2025Xiongyi Cai, Ri-Zhao Qiu et al.
[6]
SONIC: Supersizing Motion Tracking for Natural Humanoid Whole-Body Control
2025Zhengyi Luo, Ye Yuan et al.
[7]
TWIST2: Scalable, Portable, and Holistic Humanoid Data Collection System
2025Yanjie Ze, Siheng Zhao et al.
[8]
EgoMI: Learning Active Vision and Whole-Body Manipulation from Egocentric Human Demonstrations
2025Justin Yu, Yide Shentu et al.
[9]
InternVLA-M1: A Spatially Guided Vision-Language-Action Framework for Generalist Robot Policy
2025Xinyi Chen, Yilun Chen et al.
[10]
Humanoid Everyday: A Comprehensive Robotic Dataset for Open-World Humanoid Manipulation
2025Zhenyu Zhao, Hongyi Jing et al.
[11]
ResMimic: From General Motion Tracking to Humanoid Whole-body Loco-Manipulation via Residual Learning
2025Siheng Zhao, Yanjie Ze et al.
[12]
Gemini Robotics 1.5: Pushing the Frontier of Generalist Robots with Advanced Embodied Reasoning, Thinking, and Motion Transfer
2025A. Abdolmaleki, Saminda Abeyruwan et al.
[13]
Universal Humanoid Robot Pose Learning from Internet Human Videos
2025Jiageng Mao, Siheng Zhao et al.
[14]
BeyondMimic: From Motion Tracking to Versatile Humanoid Control via Guided Diffusion
2025Qiayuan Liao, Takara Truong et al.
[15]
H-RDT: Human Manipulation Enhanced Bimanual Robotic Manipulation
2025Hongzhe Bi, Lingxuan Wu et al.
[16]
Being-H0: Vision-Language-Action Pretraining from Large-Scale Human Videos
2025Hao Luo, Yicheng Feng et al.
[17]
EgoVLA: Learning Vision-Language-Action Models from Egocentric Human Videos
2025Ruihan Yang, Qinxi Yu et al.
[18]
LeVERB: Humanoid Whole-Body Control with Latent Vision-Language Instruction
2025Haoru Xue, Xiaoyu Huang et al.
[19]
CLONE: Closed-Loop Whole-Body Humanoid Teleoperation for Long-Horizon Tasks
2025Yixuan Li, Yutang Lin et al.
[20]
Real-Time Execution of Action Chunking Flow Policies
2025Kevin Black, Manuel Y. Galliker et al.

Showing 20 of 45 references

Founder's Pitch

"Psi-Zero open sources a superior foundation model for humanoid robot loco-manipulation tasks with state-of-the-art performance using efficient training data."

Humanoid RoboticsScore: 8View PDF ↗

Commercial Viability Breakdown

0-10 scale

High Potential

2/4 signals

5

Quick Build

1/4 signals

2.5

Series A Potential

4/4 signals

10

Sources used for this analysis

arXiv Paper

Full-text PDF analysis of the research paper

GitHub Repository

Code availability, stars, and contributor activity

Citation Network

Semantic Scholar citations and co-citation patterns

Community Predictions

Crowd-sourced unicorn probability assessments

Analysis model: GPT-4o · Last scored: 3/12/2026

🔭 Research Neighborhood

Generating constellation...

~3-8 seconds

Why It Matters

This research matters because it significantly improves the manipulation capabilities of humanoid robots, which are vital for their integration into complex real-world environments where they can perform tasks that are currently challenging or impossible for robots.

Product Angle

To productize this, the research should focus on developing a robust software platform that enables the customization of humanoid robots for various industry-specific tasks, offering a ready-made solution for automation in complex environments.

Disruption

This research could replace existing robotics methods that rely heavily on large-scale data training by offering an optimized solution that uses significantly less data while providing superior performance in tasks requiring dexterity and complex navigation.

Product Opportunity

The market size for humanoid robotics is growing, with applications in sectors such as manufacturing, healthcare, and hospitality. Companies in these fields will pay for solutions that automate complex, multi-step tasks that require human-like dexterity and environmental interaction.

Use Case Idea

Commercial application in high-tech facilities where humanoid robots perform complex tasks like assembly, surveillance, or personalized concierge services, enhancing automation in human-centric environments.

Science

The paper proposes a two-stage training approach for humanoid robots. First, a vision-language model is pre-trained on massive human egocentric video data to learn generalizable motion representations. Second, a post-training phase specializes the model on humanoid-specific data for precise joint control, optimizing performance with significantly less data.

Method & Eval

Extensive real-world experiments were conducted, demonstrating Psi-Zero's superior performance across multiple tasks using only 800 hours of human videos and 30 hours of robot data, outperforming models trained on much larger datasets.

Caveats

The main limitations include the potential cost and complexity of deploying advanced humanoid systems at scale in real-world environments and the specific tuning needed for different task domains.

Author Intelligence

Songlin Wei

USC Physical Superintelligence (PSI) Lab

Hongyi Jing

USC Physical Superintelligence (PSI) Lab

Boqian Li

USC Physical Superintelligence (PSI) Lab

Zhenyu Zhao

USC Physical Superintelligence (PSI) Lab

Jiageng Mao

USC Physical Superintelligence (PSI) Lab

Zhenhao Ni

USC Physical Superintelligence (PSI) Lab

Sicheng He

USC Physical Superintelligence (PSI) Lab

Jie Liu

USC Physical Superintelligence (PSI) Lab

Xiawei Liu

USC Physical Superintelligence (PSI) Lab

Kaidi Kang

USC Physical Superintelligence (PSI) Lab

Sheng Zang

USC Physical Superintelligence (PSI) Lab

Weiduo Yuan

USC Physical Superintelligence (PSI) Lab

Marco Pavone

NVIDIA

Di Huang

WorldEngine

Yue Wang

USC Physical Superintelligence (PSI) Lab

Related Papers

Loading…