CLIP
CLIP is a unknown in our research taxonomy.
Related papers
- ViCLIP-OT: The First Foundation Vision-Language Model for Vietnamese Image-Text Retrieval with Optimal Transport
- Spanning the Visual Analogy Space with a Weight Basis of LoRAs
- CLIP-Guided Unsupervised Semantic-Aware Exposure Correction
- Improving MLLMs in Embodied Exploration and Question Answering with Human-Inspired Memory Modeling
- BrandFusion: A Multi-Agent Framework for Seamless Brand Integration in Text-to-Video Generation
- SSVP: Synergistic Semantic-Visual Prompting for Industrial Zero-Shot Anomaly Detection
- Reclaiming Lost Text Layers for Source-Free Cross-Domain Few-Shot Learning
- Let's Roll a BiFTA: Bi-refinement for Fine-grained Text-visual Alignment in Vision-Language Models
- CAPT: Confusion-Aware Prompt Tuning for Reducing Vision-Language Misalignment
- pFedNavi: Structure-Aware Personalized Federated Vision-Language Navigation for Embodied AI
- Guiding Diffusion-based Reconstruction with Contrastive Signals for Balanced Visual Representation
- MoECLIP: Patch-Specialized Experts for Zero-shot Anomaly Detection
- AlphaFace: High Fidelity and Real-time Face Swapper Robust to Facial Pose
- Dynamic Training-Free Fusion of Subject and Style LoRAs
- DanQing: An Up-to-Date Large-Scale Chinese Vision-Language Pre-training Dataset
- SemVideo: Reconstructs What You Watch from Brain Activity via Hierarchical Semantic Guidance
- Xray-Visual Models: Scaling Vision models on Industry Scale Data
- A Contrastive Learning Framework Empowered by Attention-based Feature Adaptation for Street-View Image Classification
- CT-Bench: A Benchmark for Multimodal Lesion Understanding in Computed Tomography
- Pseudo Contrastive Learning for Diagram Comprehension in Multimodal Models