CAST: Modeling Visual State Transitions for Consistent Video Retrieval | ScienceToStartup | ScienceToStartup