Papers
1–4 of 4Trajectory-Diversity-Driven Robust Vision-and-Language Navigation
Vision-and-Language Navigation (VLN) requires agents to navigate photo-realistic environments following natural language instructions. Current methods predominantly rely on imitation learning, which s...
Implicit Geometry Representations for Vision-and-Language Navigation from Web Videos
Vision-and-Language Navigation (VLN) has long been constrained by the limited diversity and scalability of simulator-curated datasets, which fail to capture the complexity of real-world environments. ...
DecoVLN: Decoupling Observation, Reasoning, and Correction for Vision-and-Language Navigation
Vision-and-Language Navigation (VLN) requires agents to follow long-horizon instructions and navigate complex 3D environments. However, existing approaches face two major challenges: constructing an e...
CMMR-VLN: Vision-and-Language Navigation via Continual Multimodal Memory Retrieval
Although large language models (LLMs) are introduced into vision-and-language navigation (VLN) to improve instruction comprehension and generalization, existing LLM- based VLN lacks the ability to sel...