3 papers - avg viability 6.0
A novel network for precise video segment retrieval that overcomes information density mismatches and attention limitations by refining knowledge and context.
CAST is a plug-and-play adapter that improves the temporal coherence of video retrieval and generation by modeling visual state transitions.
ShotFinder provides a benchmark and retrieval pipeline for open-domain video shot retrieval based on keyframe-oriented descriptions.