ScienceToStartup
TrendsTopicsSavedArticlesChangelogCareersAbout

113 Cherry St #92768

Seattle, WA 98104-2205

Backed by Research Labs
All systems operational

Product

  • Dashboard
  • Workspace
  • Build Loop
  • Research Map
  • Trends
  • Topics
  • Articles

Enterprise

  • TTO Dashboard
  • Scout Reports
  • RFP Marketplace
  • API

Resources

  • All Resources
  • Benchmark
  • Database
  • Dataset
  • Calculator
  • Glossary
  • State Reports
  • Industry Index
  • Directory
  • Templates
  • Alternatives
  • Changelog
  • FAQ
  • Docs

Company

  • About
  • Careers
  • For Media
  • Privacy Policy
  • Legal
  • Contact

Community

  • Open Source
  • Community
ScienceToStartup

Copyright © 2026 ScienceToStartup. All rights reserved.

Privacy Policy|Legal

Papers

250

With code

199

Suggested Build

150

Suggested Watch

38

🔔

Preview from your Build/Watch decisions. Set up Scout for daily delivery.

Memory-Augmented Vision-Language Agents for Persistent and Semantically Consistent Object Captioning

Morning brief

High conviction build candidate

OneSearch-V2: The Latent Reasoning Enhanced Self-distillation Generative Search Framework

Morning brief

High conviction build candidate

AVO: Agentic Variation Operators for Autonomous Evolutionary Search

48h review

Needs sharper wedge before committing

Saved thesis

Find deployable ai papers with public code, proof pass, and a wedge that can ship inside 6 weeks.

🔔Run morning brief

Novelty / saturation by cluster

Uses the current paper cohort to show whether a lane looks crowded or sparse, with named comparable papers from the same slice.

  • Medical AI

    CliPPER: Contextual Video-Language Pretraining on Long-form Intraoperative Surgical Procedures for Event Recognition · EndoVGGT: GNN-Enhanced Depth Estimation for Surgical 3D Reconstruction

    18

    Crowded

  • Computer Vision

    Vision-Language Models vs Human: Perceptual Image Quality Assessment · Language-Guided Structure-Aware Network for Camouflaged Object Detection

    9

    Balanced

  • Robotics

    TAG: Target-Agnostic Guidance for Stable Object-Centric Inference in Vision-Language-Action Models · Knowledge-Guided Manipulation Using Multi-Task Reinforcement Learning

    7

    Balanced

  • Robotics AI

    SOMA: Strategic Orchestration and Memory-Augmented System for Vision-Language-Action Model Robustness via In-Context Adaptation · Chameleon: Episodic Memory for Long-Horizon Robotic Manipulation

    4

    Rarer lane

  • Generative Image

    ScrollScape: Unlocking 32K Image Generation With Video Diffusion Priors · RefReward-SR: LR-Conditioned Reward Modeling for Preference-Aligned Super-Resolution

    4

    Rarer lane

  • LLM Training

    Sparse Growing Transformer: Training-Time Sparse Depth Allocation via Progressive Attention Looping · A Deep Dive into Scaling RL for Code Generation with Synthetic Data and Curricula

    4

    Rarer lane

  • Vision-Language Models

    VFIG: Vectorizing Complex Figures in SVG with Vision-Language Models · Revealing Multi-View Hallucination in Large Vision-Language Models

    3

    Rarer lane

  • Multimodal AI

    Thinking with Tables: Enhancing Multi-Modal Tabular Understanding via Neuro-Symbolic Reasoning · Video-Only ToM: Enhancing Theory of Mind in Multimodal Large Language Models

    3

    Rarer lane

  • Agents

    UI-Voyager: A Self-Evolving GUI Agent Learning via Failed Experience · Language-Grounded Multi-Agent Planning for Personalized and Fair Participatory Urban Sensing

    3

    Rarer lane

  • Graph Neural Networks

    Reservoir-Based Graph Convolutional Networks · CGRL: Causal-Guided Representation Learning for Graph Out-of-Distribution Generalization

    3

    Rarer lane

  • Educational AI

    Robust Multilingual Text-to-Pictogram Mapping for Scalable Reading Rehabilitation · Representation Learning to Study Temporal Dynamics in Tutorial Scaffolding

    3

    Rarer lane

  • Autonomous Driving Simulation

    Toward Physically Consistent Driving Video World Models under Challenging Trajectories · MonoSIM: An open source SIL framework for Ackermann Vehicular Systems with Monocular Vision

    2

    Rarer lane

Memory-Augmented Vision-Language Agents for Persistent and Semantically Consistent Object Captioning

Vision-Language Systems2026-03-25Build NowPending
Commercial100
Deployability—
Reproducibility40
Novelty100
View full paper →

No dossier data.