Papers
1–4 of 4CAViT -- Channel-Aware Vision Transformer for Dynamic Feature Fusion
Vision Transformers (ViTs) have demonstrated strong performance across a range of computer vision tasks by modeling long-range spatial interactions via self-attention. However, channel-wise mixing in ...
Adaptive MLP Pruning for Large Vision Transformers
Large vision transformers present impressive scalability, as their performance can be well improved with increased model capacity. Nevertheless, their cumbersome parameters results in exorbitant compu...
Semi-Supervised Masked Autoencoders: Unlocking Vision Transformer Potential with Limited Data
We address the challenge of training Vision Transformers (ViTs) when labeled data is scarce but unlabeled data is abundant. We propose Semi-Supervised Masked Autoencoder (SSMAE), a framework that join...
HiAP: A Multi-Granular Stochastic Auto-Pruning Framework for Vision Transformers
Vision Transformers require significant computational resources and memory bandwidth, severely limiting their deployment on edge devices. While recent structured pruning methods successfully reduce th...