Build Loop | ScienceToStartup

DateSearchCodeProof

Papers

250

With code

199

Suggested Build

150

Suggested Watch

🔔

Preview from your Build/Watch decisions. Set up Scout for daily delivery.

Memory-Augmented Vision-Language Agents for Persistent and Semantically Consistent Object Captioning

Morning brief

High conviction build candidate

OneSearch-V2: The Latent Reasoning Enhanced Self-distillation Generative Search Framework

Morning brief

High conviction build candidate

AVO: Agentic Variation Operators for Autonomous Evolutionary Search

48h review

Needs sharper wedge before committing

Saved thesis

Find deployable ai papers with public code, proof pass, and a wedge that can ship inside 6 weeks.

🔔Run morning brief

Novelty / saturation by cluster

Uses the current paper cohort to show whether a lane looks crowded or sparse, with named comparable papers from the same slice.

Medical AI
CliPPER: Contextual Video-Language Pretraining on Long-form Intraoperative Surgical Procedures for Event Recognition · EndoVGGT: GNN-Enhanced Depth Estimation for Surgical 3D Reconstruction
18
Crowded
Computer Vision
Vision-Language Models vs Human: Perceptual Image Quality Assessment · Language-Guided Structure-Aware Network for Camouflaged Object Detection
9
Balanced
Robotics
TAG: Target-Agnostic Guidance for Stable Object-Centric Inference in Vision-Language-Action Models · Knowledge-Guided Manipulation Using Multi-Task Reinforcement Learning
7
Balanced
Robotics AI
SOMA: Strategic Orchestration and Memory-Augmented System for Vision-Language-Action Model Robustness via In-Context Adaptation · Chameleon: Episodic Memory for Long-Horizon Robotic Manipulation
4
Rarer lane
Generative Image
ScrollScape: Unlocking 32K Image Generation With Video Diffusion Priors · RefReward-SR: LR-Conditioned Reward Modeling for Preference-Aligned Super-Resolution
4
Rarer lane
LLM Training
Sparse Growing Transformer: Training-Time Sparse Depth Allocation via Progressive Attention Looping · A Deep Dive into Scaling RL for Code Generation with Synthetic Data and Curricula
4
Rarer lane
Vision-Language Models
VFIG: Vectorizing Complex Figures in SVG with Vision-Language Models · Revealing Multi-View Hallucination in Large Vision-Language Models
3
Rarer lane
Multimodal AI
Thinking with Tables: Enhancing Multi-Modal Tabular Understanding via Neuro-Symbolic Reasoning · Video-Only ToM: Enhancing Theory of Mind in Multimodal Large Language Models
3
Rarer lane
Agents
UI-Voyager: A Self-Evolving GUI Agent Learning via Failed Experience · Language-Grounded Multi-Agent Planning for Personalized and Fair Participatory Urban Sensing
3
Rarer lane
Graph Neural Networks
Reservoir-Based Graph Convolutional Networks · CGRL: Causal-Guided Representation Learning for Graph Out-of-Distribution Generalization
3
Rarer lane
Educational AI
Robust Multilingual Text-to-Pictogram Mapping for Scalable Reading Rehabilitation · Representation Learning to Study Temporal Dynamics in Tutorial Scaffolding
3
Rarer lane
Autonomous Driving Simulation
Toward Physically Consistent Driving Video World Models under Challenging Trajectories · MonoSIM: An open source SIL framework for Ackermann Vehicular Systems with Monocular Vision
2
Rarer lane

Memory-Augmented Vision-Language Agents for Persistent and Semantically Consistent Object Captioning

Vision-Language Systems2026-03-25Build NowPending

Commercial100

Deployability—

Reproducibility40

Novelty100

View full paper →

No dossier data.