$Ψ_0$: An Open Foundation Model Towards Universal Humanoid Loco-Manipulation

Export Brief Connect with Author

View PDF ↗

PDF Viewer

100%

Open Full PDF

BUILDER'S SANDBOX

Build This Paper

Use an AI coding agent to implement this research.

OpenAI CodexAI Agent

Lightweight coding agent in your terminal.

Claude CodeAI Agent

Agentic coding tool for terminal workflows.

AntiGravity IDEScaffolding

AI agent mindset installer and workflow scaffolder.

CursorIDE

AI-first code editor built on VS Code.

VS CodeIDE

Free, open-source editor by Microsoft.

Recommended Stack

FastAPIBackend

PyTorchML Framework

TensorFlowML Framework

JAXML Framework

KerasML Framework

Startup Essentials

Render

Deploy Backend

Railway

Full-Stack Deploy

Supabase

Backend & Auth

Vercel

Deploy Frontend

Firebase

Google Backend

Hugging Face Hub

ML Model Hub

Banana.dev

GPU Inference

Antigravity

AI Agent IDE

MVP Investment

$9K - $12K

6-10 weeks

Engineering

$8,000

Cloud Hosting

$240

SaaS Stack

$300

Domain & Legal

$100

6mo ROI

2-4x

3yr ROI

10-20x

Lightweight AI tools can reach profitability quickly. At $500/mo average contract, 20 customers = $10K MRR by 6mo, 200+ by 3yr.

Talent Scout

Songlin Wei

USC Physical Superintelligence (PSI) Lab

Hongyi Jing

USC Physical Superintelligence (PSI) Lab

Boqian Li

USC Physical Superintelligence (PSI) Lab

Zhenyu Zhao

USC Physical Superintelligence (PSI) Lab

Find Similar Experts

Humanoid experts on LinkedIn & GitHub

References (45)

[1]

Coordinated Humanoid Manipulation with Choice Policies

2025Haozhi Qi, Yen-Jen Wang et al.

[2]

WholeBodyVLA: Towards Unified Latent VLA for Whole-Body Loco-Manipulation Control

2025Haoran Jiang, Jin Chen et al.

[3]

Training-Time Action Conditioning for Efficient Real-Time Chunking

2025Kevin Black, Allen Z. Ren et al.

[4]

Qwen3-VL Technical Report

2025Shuai Bai, Yuxuan Cai et al.

[5]

In-N-On: Scaling Egocentric Manipulation with in-the-wild and on-task Data

2025Xiongyi Cai, Ri-Zhao Qiu et al.

[6]

SONIC: Supersizing Motion Tracking for Natural Humanoid Whole-Body Control

2025Zhengyi Luo, Ye Yuan et al.

[7]

TWIST2: Scalable, Portable, and Holistic Humanoid Data Collection System

2025Yanjie Ze, Siheng Zhao et al.

[8]

EgoMI: Learning Active Vision and Whole-Body Manipulation from Egocentric Human Demonstrations

2025Justin Yu, Yide Shentu et al.

[9]

InternVLA-M1: A Spatially Guided Vision-Language-Action Framework for Generalist Robot Policy

2025Xinyi Chen, Yilun Chen et al.

[10]

Humanoid Everyday: A Comprehensive Robotic Dataset for Open-World Humanoid Manipulation

2025Zhenyu Zhao, Hongyi Jing et al.

[11]

ResMimic: From General Motion Tracking to Humanoid Whole-body Loco-Manipulation via Residual Learning

2025Siheng Zhao, Yanjie Ze et al.

[12]

Gemini Robotics 1.5: Pushing the Frontier of Generalist Robots with Advanced Embodied Reasoning, Thinking, and Motion Transfer

2025A. Abdolmaleki, Saminda Abeyruwan et al.

[13]

Universal Humanoid Robot Pose Learning from Internet Human Videos

2025Jiageng Mao, Siheng Zhao et al.

[14]

BeyondMimic: From Motion Tracking to Versatile Humanoid Control via Guided Diffusion

2025Qiayuan Liao, Takara Truong et al.

[15]

H-RDT: Human Manipulation Enhanced Bimanual Robotic Manipulation

2025Hongzhe Bi, Lingxuan Wu et al.

[16]

Being-H0: Vision-Language-Action Pretraining from Large-Scale Human Videos

2025Hao Luo, Yicheng Feng et al.

[17]

EgoVLA: Learning Vision-Language-Action Models from Egocentric Human Videos

2025Ruihan Yang, Qinxi Yu et al.

[18]

LeVERB: Humanoid Whole-Body Control with Latent Vision-Language Instruction

2025Haoru Xue, Xiaoyu Huang et al.

[19]

CLONE: Closed-Loop Whole-Body Humanoid Teleoperation for Long-Horizon Tasks

2025Yixuan Li, Yutang Lin et al.

[20]

Real-Time Execution of Action Chunking Flow Policies

2025Kevin Black, Manuel Y. Galliker et al.

Showing 20 of 45 references

Founder's Pitch

"Psi-Zero open sources a superior foundation model for humanoid robot loco-manipulation tasks with state-of-the-art performance using efficient training data."

Humanoid Robotics•Score: 8•View PDF ↗

Commercial Viability Breakdown

0-10 scale

High Potential

2/4 signals

Quick Build

1/4 signals

2.5

Series A Potential

4/4 signals

Sources used for this analysis

arXiv Paper

Full-text PDF analysis of the research paper

GitHub Repository

Code availability, stars, and contributor activity

Citation Network

Semantic Scholar citations and co-citation patterns

Community Predictions

Crowd-sourced unicorn probability assessments

Analysis model: GPT-4o · Last scored: 3/12/2026

🔭 Research Neighborhood

Generating constellation...

~3-8 seconds

Why It Matters

This research matters because it significantly improves the manipulation capabilities of humanoid robots, which are vital for their integration into complex real-world environments where they can perform tasks that are currently challenging or impossible for robots.

Product Angle

To productize this, the research should focus on developing a robust software platform that enables the customization of humanoid robots for various industry-specific tasks, offering a ready-made solution for automation in complex environments.

Disruption

This research could replace existing robotics methods that rely heavily on large-scale data training by offering an optimized solution that uses significantly less data while providing superior performance in tasks requiring dexterity and complex navigation.

Product Opportunity

The market size for humanoid robotics is growing, with applications in sectors such as manufacturing, healthcare, and hospitality. Companies in these fields will pay for solutions that automate complex, multi-step tasks that require human-like dexterity and environmental interaction.

Use Case Idea

Commercial application in high-tech facilities where humanoid robots perform complex tasks like assembly, surveillance, or personalized concierge services, enhancing automation in human-centric environments.

Science

The paper proposes a two-stage training approach for humanoid robots. First, a vision-language model is pre-trained on massive human egocentric video data to learn generalizable motion representations. Second, a post-training phase specializes the model on humanoid-specific data for precise joint control, optimizing performance with significantly less data.

Method & Eval

Extensive real-world experiments were conducted, demonstrating Psi-Zero's superior performance across multiple tasks using only 800 hours of human videos and 30 hours of robot data, outperforming models trained on much larger datasets.

Caveats

The main limitations include the potential cost and complexity of deploying advanced humanoid systems at scale in real-world environments and the specific tuning needed for different task domains.

Author Intelligence

Songlin Wei

USC Physical Superintelligence (PSI) Lab

Hongyi Jing

USC Physical Superintelligence (PSI) Lab

Boqian Li

USC Physical Superintelligence (PSI) Lab

Zhenyu Zhao

USC Physical Superintelligence (PSI) Lab

Jiageng Mao

USC Physical Superintelligence (PSI) Lab

Zhenhao Ni

USC Physical Superintelligence (PSI) Lab

Sicheng He

USC Physical Superintelligence (PSI) Lab

Jie Liu

USC Physical Superintelligence (PSI) Lab

Xiawei Liu

USC Physical Superintelligence (PSI) Lab

Kaidi Kang

USC Physical Superintelligence (PSI) Lab

Sheng Zang

USC Physical Superintelligence (PSI) Lab

Weiduo Yuan

USC Physical Superintelligence (PSI) Lab

Marco Pavone

NVIDIA

Di Huang

WorldEngine

Yue Wang

USC Physical Superintelligence (PSI) Lab

Related Papers

Loading…