ViT
ViT is a model in our research taxonomy.
Related papers
- Tracing Copied Pixels and Regularizing Patch Affinity in Copy Detection
- Experience-Guided Self-Adaptive Cascaded Agents for Breast Cancer Screening and Diagnosis with Reduced Biopsy Referrals
- Leveraging Model Soups to Classify Intangible Cultural Heritage Images from the Mekong Delta
- MetricAnything: Scaling Metric Depth Pretraining with Noisy Heterogeneous Sources
- AgriPINN: A Process-Informed Neural Network for Interpretable and Scalable Crop Biomass Prediction Under Water Stress
- PEAR: Pixel-aligned Expressive humAn mesh Recovery
- RGB-Event HyperGraph Prompt for Kilometer Marker Recognition based on Pre-trained Foundation Models
- Let's Roll a BiFTA: Bi-refinement for Fine-grained Text-visual Alignment in Vision-Language Models
- Reasoning is a Modality
- Krause Synchronization Transformers
- Pushing the Frontier of Black-Box LVLM Attacks via Fine-Grained Detail Targeting
- Robustness Is a Function, Not a Number: A Factorized Comprehensive Study of OOD Robustness in Vision-Based Driving
- When Visual Evidence is Ambiguous: Pareidolia as a Diagnostic Probe for Vision Models
- Learn from A Rationalist: Distilling Intermediate Interpretable Rationales