Computer Vision

21papers
5.9viability
+10%30d

State of the Field

Current research in computer vision is increasingly focused on enhancing robustness and adaptability across diverse environments and tasks. Recent work on road surface classification demonstrates the effectiveness of multimodal sensor fusion, improving performance in variable conditions, which is crucial for predictive maintenance systems in transportation. Simultaneously, advancements in vision-as-inverse-graphics are enabling more sophisticated scene reconstruction and editing, broadening applications in design and entertainment. The emergence of frameworks like Sea² illustrates a shift towards intelligent deployment of existing models without extensive retraining, addressing challenges in novel environments. In the realm of anomaly detection, new simulation tools are providing researchers with customizable datasets, facilitating the development of robust models. Additionally, innovations in face-swapping and cloth dynamics learning highlight the field's push towards real-time applications and unsupervised learning, respectively. Collectively, these trends indicate a concerted effort to create more versatile, efficient, and user-friendly computer vision systems capable of tackling real-world challenges.

Last updated Mar 2, 2026

Papers

1–10 of 21
Research Paper·Jan 28, 2026

A New Dataset and Framework for Robust Road Surface Classification via Camera-IMU Fusion

Road surface classification (RSC) is a key enabler for environment-aware predictive maintenance systems. However, existing RSC techniques often fail to generalize beyond narrow operational conditions ...

9.0 viability
Research Paper·Jan 23, 2026

AlphaFace: High Fidelity and Real-time Face Swapper Robust to Facial Pose

Existing face-swapping methods often deliver competitive results in constrained settings but exhibit substantial quality degradation when handling extreme facial poses. To improve facial pose robustne...

8.0 viability
Research Paper·Feb 27, 2026

See, Act, Adapt: Active Perception for Unsupervised Cross-Domain Visual Adaptation via Personalized VLM-Guided Agent

Pre-trained perception models excel in generic image domains but degrade significantly in novel environments like indoor scenes. The conventional remedy is fine-tuning on downstream data which incurs ...

8.0 viability
Research Paper·Jan 16, 2026

Vision-as-Inverse-Graphics Agent via Interleaved Multimodal Reasoning

Vision-as-inverse-graphics, the concept of reconstructing an image as an editable graphics program is a long-standing goal of computer vision. Yet even strong VLMs aren't able to achieve this in one-s...

8.0 viability
Research Paper·Mar 2, 2026

MixerCSeg: An Efficient Mixer Architecture for Crack Segmentation via Decoupled Mamba Attention

Feature encoders play a key role in pixel-level crack segmentation by shaping the representation of fine textures and thin structures. Existing CNN-, Transformer-, and Mamba-based models each capture ...

7.0 viability
Research Paper·Feb 19, 2026

Tracing Copied Pixels and Regularizing Patch Affinity in Copy Detection

Image Copy Detection (ICD) aims to identify manipulated content between image pairs through robust feature representation learning. While self-supervised learning (SSL) has advanced ICD systems, exist...

7.0 viability
Research Paper·Mar 4, 2026

Discriminative Perception via Anchored Description for Reasoning Segmentation

Reasoning segmentation increasingly employs reinforcement learning to generate explanatory reasoning chains that guide Multimodal Large Language Models. While these geometric rewards are primarily con...

6.0 viability
Research Paper·Jan 28, 2026

Decoupling Perception and Calibration: Label-Efficient Image Quality Assessment Framework

Recent multimodal large language models (MLLMs) have demonstrated strong capabilities in image quality assessment (IQA) tasks. However, adapting such large-scale models is computationally expensive an...

6.0 viability
Research Paper·Feb 2, 2026

CloDS: Visual-Only Unsupervised Cloth Dynamics Learning in Unknown Conditions

Deep learning has demonstrated remarkable capabilities in simulating complex dynamic systems. However, existing methods require known physical properties as supervision or inputs, limiting their appli...

6.0 viability
Research Paper·Feb 2, 2026

Real-Time Loop Closure Detection in Visual SLAM via NetVLAD and Faiss

Loop closure detection (LCD) is a core component of simultaneous localization and mapping (SLAM): it identifies revisited places and enables pose-graph constraints that correct accumulated drift. Clas...

6.0 viability
Page 1 of 3