Vision-Language-Action Comparison Hub

DepthCache: Depth-Guided Training-Free Visual Token Merging for Vision-Language-Action Model Inference(8.0)

DepthCache is a training-free framework that optimizes visual token merging for faster robotic manipulation without degrading performance.

AR-VLA: True Autoregressive Action Expert for Vision-Language-Action Models(7.0)

AR-VLA is a context-aware autoregressive action generator for robotic manipulation tasks that enhances action trajectory smoothness and task success rates.

Vision-Language-Action Comparison Hub

Reference Surfaces

Top Papers