SoPE: Spherical Coordinate-Based Positional Embedding for Enhancing Spatial Perception of 3D LVLMs

PDF Viewer

BUILDER'S SANDBOX

Build This Paper

Use an AI coding agent to implement this research.

OpenAI Codex
OpenAI CodexAI Agent

Lightweight coding agent in your terminal.

Claude Code
Claude CodeAI Agent

Agentic coding tool for terminal workflows.

AntiGravity IDE
AntiGravity IDEScaffolding

AI agent mindset installer and workflow scaffolder.

Cursor
CursorIDE

AI-first code editor built on VS Code.

VS Code
VS CodeIDE

Free, open-source editor by Microsoft.

Estimated $9K - $13K over 6-10 weeks.

See exactly what it costs to build this -- with 3 comparable funded startups.

7-day free trial. Cancel anytime.

Discover the researchers behind this paper and find similar experts.

7-day free trial. Cancel anytime.

References (84)

[1]
MiniCPM-V 4.5: Cooking Efficient MLLMs via Architecture, Data, and Training Recipe
2025Tianyu Yu, Zefan Wang et al.
[2]
LumiGen: An LVLM-Enhanced Iterative Framework for Fine-Grained Text-to-Image Generation
2025Xiaoqing Dong, Xiangyu Zhou et al.
[3]
MCA-LLaVA: Manhattan Causal Attention for Reducing Hallucination in Large Vision-Language Models
2025Qiyan Zhao, Xiaofeng Zhang et al.
[4]
Zero-shot 3D Question Answering via Voxel-based Dynamic Token Compression
2025Hsiang-Wei Huang, Fu-Chen Chen et al.
[5]
SpatialLM: Training Large Language Models for Structured Indoor Modeling
2025Yongsen Mao, Junhao Zhong et al.
[6]
CoMemo: LVLMs Need Image Context with Image Memory
2025Shi Liu, Weijie Su et al.
[7]
ComRoPE: Scalable and Robust Rotary Position Embedding Parameterized by Trainable Commuting Angle Matrices
2025Hao Yu, Tangyu Jiang et al.
[8]
3DLLM-Mem: Long-Term Spatial-Temporal Memory for Embodied 3D Large Language Model
2025Wenbo Hu, Yining Hong et al.
[9]
VLM-3R: Vision-Language Models Augmented with Instruction-Aligned 3D Reconstruction
2025Zhiwen Fan, Jian Zhang et al.
[10]
Circle-RoPE: Cone-like Decoupled Rotary Positional Embedding for Large Vision-Language Models
2025Chengcheng Wang, Jianyuan Guo et al.
[11]
LLaVA-4D: Embedding SpatioTemporal Prompt into LMMs for 4D Scene Understanding
2025Hanyu Zhou, Gim Hee Lee
[12]
SpatialLLM: A Compound 3D-Informed Design towards Spatially-Intelligent Large Multimodal Models
2025Wufei Ma, Luoxin Ye et al.
[13]
V3LMA: Visual 3D-Enhanced Language Model for Autonomous Driving
2025Jannik Lübberstedt, Esteban Rivera et al.
[14]
RoPETR: Improving Temporal Camera-Only 3D Detection by Integrating Enhanced Rotary Position Embedding
2025Hang Ji, Tao Ni et al.
[15]
ReasonGrounder: LVLM-Guided Hierarchical Feature Splatting for Open-Vocabulary 3D Visual Grounding and Reasoning
2025Zhenyang Liu, Yikai Wang et al.
[16]
Sonata: Self-Supervised Learning of Reliable Point Representations
2025Xiaoyang Wu, Daniel DeTone et al.
[17]
DRoPE: Directional Rotary Position Embedding for Efficient Agent Interaction Modeling
2025Jianbo Zhao, Taiyu Ban et al.
[18]
MM-Spatial: Exploring 3D Spatial Understanding in Multimodal LLMs
2025Erik A. Daxberger, Nina Wenzel et al.
[19]
Inst3D-LMM: Instance-Aware 3D Scene Understanding with Multi-modal Instruction Tuning
2025Hanxun Yu, Wentong Li et al.
[20]
MIM: High-Definition Maps Incorporated Multi-View 3D Object Detection
2025Jinsheng Xiao, Shurui Wang et al.

Showing 20 of 84 references

Founder's Pitch

"Enhance 3D spatial perception in LVLMs with spherical coordinate-based positional embeddings."

3D VisionScore: 5View PDF ↗

Commercial Viability Breakdown

0-10 scale

High Potential

1/4 signals

2.5

Quick Build

3/4 signals

7.5

Series A Potential

0/4 signals

0

Sources used for this analysis

arXiv Paper

Full-text PDF analysis of the research paper

GitHub Repository

Code availability, stars, and contributor activity

Citation Network

Semantic Scholar citations and co-citation patterns

Community Predictions

Crowd-sourced unicorn probability assessments

Analysis model: GPT-4o · Last scored: 2/26/2026

Explore the full citation network and related research.

7-day free trial. Cancel anytime.

Understand the commercial significance and market impact.

7-day free trial. Cancel anytime.

Get detailed profiles of the research team.

7-day free trial. Cancel anytime.

Related Papers

Loading…