State of the Field
Current research in 3D perception is increasingly focused on enhancing the robustness and efficiency of spatial understanding in dynamic environments. Recent work is addressing the limitations of traditional methods by integrating omnidirectional inputs and leveraging temporal coherence, which is crucial for applications in autonomous navigation and robotics. For instance, frameworks like O3N and FrameVGGT are pushing the boundaries of occupancy prediction and memory management, enabling systems to better interpret complex scenes and maintain performance over extended periods. Additionally, approaches like OWL and EventVGGT are innovating how visual motion cues and event-based data are processed, offering new pathways for real-time decision-making and depth estimation in challenging conditions. These advancements are not only improving accuracy in 3D mapping but also expanding the potential for applications in safety-critical domains, such as autonomous vehicles and robotic systems, where reliable perception is essential for operational success.
Papers
1–5 of 5O3N: Omnidirectional Open-Vocabulary Occupancy Prediction
Understanding and reconstructing the 3D world through omnidirectional perception is an inevitable trend in the development of autonomous agents and embodied intelligence. However, existing 3D occupanc...
FrameVGGT: Frame Evidence Rolling Memory for streaming VGGT
Streaming Visual Geometry Transformers such as StreamVGGT enable strong online 3D perception but suffer from unbounded KV-cache growth, which limits deployment over long streams. We revisit bounded-me...
OWL: A Novel Approach to Machine Perception During Motion
We introduce a perception-related function, OWL, designed to address the complex challenges of 3D perception during motion. It derives its values directly from two fundamental visual motion cues, with...
EventVGGT: Exploring Cross-Modal Distillation for Consistent Event-based Depth Estimation
Event cameras offer superior sensitivity to high-speed motion and extreme lighting, making event-based monocular depth estimation a promising approach for robust 3D perception in challenging condition...
Splat2Real: Novel-view Scaling for Physical AI with 3D Gaussian Splatting
Physical AI faces viewpoint shift between training and deployment, and novel-view robustness is essential for monocular RGB-to-3D perception. We cast Real2Render2Real monocular depth pretraining as im...