Papers
1–5 of 5Visual Words Meet BM25: Sparse Auto-Encoder Visual Word Scoring for Image Retrieval
Dense image retrieval is accurate but offers limited interpretability and attribution, and it can be compute-intensive at scale. We present \textbf{BM25-V}, which applies Okapi BM25 scoring to sparse ...
NaiLIA: Multimodal Nail Design Retrieval Based on Dense Intent Descriptions and Palette Queries
We focus on the task of retrieving nail design images based on dense intent descriptions, which represent multi-layered user intent for nail designs. This is challenging because such descriptions spec...
OSCAR: Optimization-Steered Agentic Planning for Composed Image Retrieval
Composed image retrieval (CIR) requires complex reasoning over heterogeneous visual and textual constraints. Existing approaches largely fall into two paradigms: unified embedding retrieval, which suf...
FBCIR: Balancing Cross-Modal Focuses in Composed Image Retrieval
Composed image retrieval (CIR) requires multi-modal models to jointly reason over visual content and semantic modifications presented in text-image input pairs. While current CIR models achieve strong...
Visual Model Checking: Graph-Based Inference of Visual Routines for Image Retrieval
Information retrieval lies at the foundation of the modern digital industry. While natural language search has seen dramatic progress in recent years largely driven by embedding-based models and large...