Vision Transformer

Vision Transformer is a model in our research taxonomy.

Related papers

Diffusion-Driven Deceptive Patches: Adversarial Manipulation and Forensic Detection in Facial Identity Verification
MetricAnything: Scaling Metric Depth Pretraining with Noisy Heterogeneous Sources
Do Transformers Understand Ancient Roman Coin Motifs Better than CNNs?
PEAR: Pixel-aligned Expressive humAn mesh Recovery
RGB-Event HyperGraph Prompt for Kilometer Marker Recognition based on Pre-trained Foundation Models
MapViT: A Two-Stage ViT-Based Framework for Real-Time Radio Quality Map Prediction in Dynamic Environments
FedBCD:Communication-Efficient Accelerated Block Coordinate Gradient Descent for Federated Learning
Self-learned representation-guided latent diffusion model for breast cancer classification in deep ultraviolet whole surface images
Xray-Visual Models: Scaling Vision models on Industry Scale Data
A Framework for Cross-Domain Generalization in Coronary Artery Calcium Scoring Across Gated and Non-Gated Computed Tomography
A Contrastive Learning Framework Empowered by Attention-based Feature Adaptation for Street-View Image Classification
ECHOSAT: Estimating Canopy Height Over Space And Time
Learn from A Rationalist: Distilling Intermediate Interpretable Rationales