State of the Field
Recent advancements in robotics are increasingly focusing on enhancing human-robot collaboration and improving task efficiency across diverse applications. Researchers are developing sophisticated frameworks that leverage human demonstrations to teach robots complex interactions, as seen in methods that utilize physics-aware retargeting and hierarchical action reasoning. In emergency scenarios, lightweight trajectory planners are enabling drones to navigate dynamically while assisting humans, showcasing practical applications in search and rescue operations. Additionally, innovations in reinforcement learning are streamlining processes like cloth manipulation and path planning for unmanned ground vehicles, reducing computational costs while enhancing adaptability. The integration of visual perception with advanced planning techniques is also being explored to improve agility in quadruped robots. Overall, the field is shifting towards more data-efficient, interpretable systems capable of operating in real-world environments, addressing both commercial needs and operational challenges in sectors like agriculture, logistics, and emergency response.
Papers
1–10 of 34Learning Whole-Body Human-Humanoid Interaction from Human-Human Demonstrations
Enabling humanoid robots to physically interact with humans is a critical frontier, but progress is hindered by the scarcity of high-quality Human-Humanoid Interaction (HHoI) data. While leveraging ab...
Differentiable Inverse Graphics for Zero-shot Scene Reconstruction and Robot Grasping
Operating effectively in novel real-world environments requires robotic systems to estimate and interact with previously unseen objects. Current state-of-the-art models address this challenge by using...
HumanDiffusion: A Vision-Based Diffusion Trajectory Planner with Human-Conditioned Goals for Search and Rescue UAV
Reliable human--robot collaboration in emergency scenarios requires autonomous systems that can detect humans, infer navigation goals, and operate safely in dynamic environments. This paper presents H...
Disentangling perception and reasoning for improving data efficiency in learning cloth manipulation without demonstrations
Cloth manipulation is a ubiquitous task in everyday life, but it remains an open challenge for robotics. The difficulties in developing cloth manipulation policies are attributed to the high-dimension...
Scaling Tasks, Not Samples: Mastering Humanoid Control through Multi-Task Model-Based Reinforcement Learning
Developing generalist robots capable of mastering diverse skills remains a central challenge in embodied AI. While recent progress emphasizes scaling model parameters and offline datasets, such approa...
IROSA: Interactive Robot Skill Adaptation using Natural Language
Foundation models have demonstrated impressive capabilities across diverse domains, while imitation learning provides principled methods for robot skill adaptation from limited data. Combining these a...
TOPReward: Token Probabilities as Hidden Zero-Shot Rewards for Robotics
While Vision-Language-Action (VLA) models have seen rapid progress in pretraining, their advancement in Reinforcement Learning (RL) remains hampered by low sample efficiency and sparse rewards in real...
Learning Object-Centric Spatial Reasoning for Sequential Manipulation in Cluttered Environments
Robotic manipulation in cluttered environments presents a critical challenge for automation. Recent large-scale, end-to-end models demonstrate impressive capabilities but often lack the data efficienc...
Non-Markovian Long-Horizon Robot Manipulation via Keyframe Chaining
Existing Vision-Language-Action (VLA) models often struggle to generalize to long-horizon tasks due to their heavy reliance on immediate observations. While recent studies incorporate retrieval mechan...
VISTA: Enhancing Visual Conditioning via Track-Following Preference Optimization in Vision-Language-Action Models
Vision-Language-Action (VLA) models have demonstrated strong performance across a wide range of robotic manipulation tasks. Despite the success, extending large pretrained Vision-Language Models (VLMs...