State of the Field
Current research in computer vision is increasingly focused on enhancing robustness and adaptability across diverse environments and tasks. Recent work on road surface classification demonstrates the effectiveness of multimodal sensor fusion, improving performance in variable conditions, which is crucial for predictive maintenance systems in transportation. Simultaneously, advancements in vision-as-inverse-graphics are enabling more sophisticated scene reconstruction and editing, broadening applications in design and entertainment. The emergence of frameworks like Sea² illustrates a shift towards intelligent deployment of existing models without extensive retraining, addressing challenges in novel environments. In the realm of anomaly detection, new simulation tools are providing researchers with customizable datasets, facilitating the development of robust models. Additionally, innovations in face-swapping and cloth dynamics learning highlight the field's push towards real-time applications and unsupervised learning, respectively. Collectively, these trends indicate a concerted effort to create more versatile, efficient, and user-friendly computer vision systems capable of tackling real-world challenges.
Papers
1–10 of 21A New Dataset and Framework for Robust Road Surface Classification via Camera-IMU Fusion
Road surface classification (RSC) is a key enabler for environment-aware predictive maintenance systems. However, existing RSC techniques often fail to generalize beyond narrow operational conditions ...
AlphaFace: High Fidelity and Real-time Face Swapper Robust to Facial Pose
Existing face-swapping methods often deliver competitive results in constrained settings but exhibit substantial quality degradation when handling extreme facial poses. To improve facial pose robustne...
See, Act, Adapt: Active Perception for Unsupervised Cross-Domain Visual Adaptation via Personalized VLM-Guided Agent
Pre-trained perception models excel in generic image domains but degrade significantly in novel environments like indoor scenes. The conventional remedy is fine-tuning on downstream data which incurs ...
Vision-as-Inverse-Graphics Agent via Interleaved Multimodal Reasoning
Vision-as-inverse-graphics, the concept of reconstructing an image as an editable graphics program is a long-standing goal of computer vision. Yet even strong VLMs aren't able to achieve this in one-s...
MixerCSeg: An Efficient Mixer Architecture for Crack Segmentation via Decoupled Mamba Attention
Feature encoders play a key role in pixel-level crack segmentation by shaping the representation of fine textures and thin structures. Existing CNN-, Transformer-, and Mamba-based models each capture ...
Tracing Copied Pixels and Regularizing Patch Affinity in Copy Detection
Image Copy Detection (ICD) aims to identify manipulated content between image pairs through robust feature representation learning. While self-supervised learning (SSL) has advanced ICD systems, exist...
Discriminative Perception via Anchored Description for Reasoning Segmentation
Reasoning segmentation increasingly employs reinforcement learning to generate explanatory reasoning chains that guide Multimodal Large Language Models. While these geometric rewards are primarily con...
Decoupling Perception and Calibration: Label-Efficient Image Quality Assessment Framework
Recent multimodal large language models (MLLMs) have demonstrated strong capabilities in image quality assessment (IQA) tasks. However, adapting such large-scale models is computationally expensive an...
CloDS: Visual-Only Unsupervised Cloth Dynamics Learning in Unknown Conditions
Deep learning has demonstrated remarkable capabilities in simulating complex dynamic systems. However, existing methods require known physical properties as supervision or inputs, limiting their appli...
Real-Time Loop Closure Detection in Visual SLAM via NetVLAD and Faiss
Loop closure detection (LCD) is a core component of simultaneous localization and mapping (SLAM): it identifies revisited places and enables pose-graph constraints that correct accumulated drift. Clas...