3D Perception

Trending
5papers
6.2viability
+100%30d

State of the Field

Current research in 3D perception is increasingly focused on enhancing the robustness and efficiency of spatial understanding in dynamic environments. Recent work is addressing the limitations of traditional methods by integrating omnidirectional inputs and leveraging temporal coherence, which is crucial for applications in autonomous navigation and robotics. For instance, frameworks like O3N and FrameVGGT are pushing the boundaries of occupancy prediction and memory management, enabling systems to better interpret complex scenes and maintain performance over extended periods. Additionally, approaches like OWL and EventVGGT are innovating how visual motion cues and event-based data are processed, offering new pathways for real-time decision-making and depth estimation in challenging conditions. These advancements are not only improving accuracy in 3D mapping but also expanding the potential for applications in safety-critical domains, such as autonomous vehicles and robotic systems, where reliable perception is essential for operational success.

Last updated Mar 13, 2026

Papers

1–5 of 5
Research Paper·Mar 12, 2026

O3N: Omnidirectional Open-Vocabulary Occupancy Prediction

Understanding and reconstructing the 3D world through omnidirectional perception is an inevitable trend in the development of autonomous agents and embodied intelligence. However, existing 3D occupanc...

8.0 viability
Research Paper·Mar 8, 2026

FrameVGGT: Frame Evidence Rolling Memory for streaming VGGT

Streaming Visual Geometry Transformers such as StreamVGGT enable strong online 3D perception but suffer from unbounded KV-cache growth, which limits deployment over long streams. We revisit bounded-me...

7.0 viability
Research Paper·Mar 5, 2026

OWL: A Novel Approach to Machine Perception During Motion

We introduce a perception-related function, OWL, designed to address the complex challenges of 3D perception during motion. It derives its values directly from two fundamental visual motion cues, with...

7.0 viability
Research Paper·Mar 10, 2026

EventVGGT: Exploring Cross-Modal Distillation for Consistent Event-based Depth Estimation

Event cameras offer superior sensitivity to high-speed motion and extreme lighting, making event-based monocular depth estimation a promising approach for robust 3D perception in challenging condition...

7.0 viability
Research Paper·Mar 11, 2026

Splat2Real: Novel-view Scaling for Physical AI with 3D Gaussian Splatting

Physical AI faces viewpoint shift between training and deployment, and novel-view robustness is essential for monocular RGB-to-3D perception. We cast Real2Render2Real monocular depth pretraining as im...

2.0 viability