Vision-Language-Action Models Comparison Hub
4 papers - avg viability 5.5
Top Papers
- AugVLA-3D: Depth-Driven Feature Augmentation for Vision-Language-Action Models(7.0)
Integrate depth estimation with Vision-Language-Action models to improve robotic 3D perception and action accuracy.
- ActionCodec: What Makes for Good Action Tokenizers(7.0)
ActionCodec is a high-performance action tokenizer that significantly enhances VLA models' training efficiency and performance, setting new benchmarks for robotics tasks without pre-training.
- Chain of World: World Model Thinking in Latent Motion(5.0)
CoWVLA unifies world-model temporal reasoning with latent motion representation for efficient visuomotor learning in robotics.
- Pri4R: Learning World Dynamics for Vision-Language-Action Models with Privileged 4D Representation(3.0)
Pri4R enhances VLA models with a 4D world dynamics understanding for improved manipulative task performance.