PDF Viewer

BUILDER'S SANDBOX

Build This Paper

Use an AI coding agent to implement this research.

OpenAI Codex
OpenAI CodexAI Agent

Lightweight coding agent in your terminal.

Claude Code
Claude CodeAI Agent

Agentic coding tool for terminal workflows.

AntiGravity IDE
AntiGravity IDEScaffolding

AI agent mindset installer and workflow scaffolder.

Cursor
CursorIDE

AI-first code editor built on VS Code.

VS Code
VS CodeIDE

Free, open-source editor by Microsoft.

MVP Investment

$9K - $12K
6-10 weeks
Engineering
$8,000
Cloud Hosting
$240
SaaS Stack
$300
Domain & Legal
$100

6mo ROI

2-4x

3yr ROI

10-20x

Lightweight AI tools can reach profitability quickly. At $500/mo average contract, 20 customers = $10K MRR by 6mo, 200+ by 3yr.

Talent Scout

D

Daniel Oliveira

INESC-ID Lisboa

D

David Martins de Matos

Instituto Superior Técnico, Universidade de Lisboa

Find Similar Experts

Dataset experts on LinkedIn & GitHub

References (23)

[1]
GroundCap: A Visually Grounded Image Captioning Dataset
2025Daniel A. P. Oliveira, Lourencco Teodoro et al.
[2]
CHATTER: A Character Attribution Dataset for Narrative Understanding
2024Sabyasachee Baruah, Shrikanth S. Narayanan
[3]
Character-aware audio-visual subtitling in context
2024Jaesung Huh, A. Zisserman
[4]
Story Generation from Visual Inputs: Techniques, Related Tasks, and Challenges
2024Daniel A. P. Oliveira, Eugénio Ribeiro et al.
[5]
Groundhog Grounding Large Language Models to Holistic Segmentation
2024Yichi Zhang, Ziqiao Ma et al.
[6]
DeepSeek-V3 Technical Report
2024DeepSeek-AI, A. Liu et al.
[7]
Sigmoid Loss for Language Image Pre-Training
2023Xiaohua Zhai, Basil Mustafa et al.
[8]
Multimodal Chain-of-Thought Reasoning in Language Models
2023Zhuosheng Zhang, Aston Zhang et al.
[9]
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
2023Junnan Li, Dongxu Li et al.
[10]
Visual Writing Prompts: Character-Grounded Story Generation with Curated Image Sequences
2023Xudong Hong, A. Sayeed et al.
[11]
Chain of Thought Prompting Elicits Reasoning in Large Language Models
2022Jason Wei, Xuezhi Wang et al.
[12]
LoRA: Low-Rank Adaptation of Large Language Models
2021J. Hu, Yelong Shen et al.
[13]
Transitional Adaptation of Pretrained Models for Visual Storytelling
2021Youngjae Yu, Jiwan Chung et al.
[14]
Plot and Rework: Modeling Storylines for Visual Storytelling
2021Chi-Yang Hsu, Yun-Wei Chu et al.
[15]
Two Heads are Better Than One: Hypergraph-Enhanced Graph Reasoning for Visual Event Ratiocination
2021Wenbo Zheng, Lan Yan et al.
[16]
ArcFace: Additive Angular Margin Loss for Deep Face Recognition
2018Jiankang Deng, J. Guo et al.
[17]
Decoupled Weight Decay Regularization
2017I. Loshchilov, F. Hutter
[18]
SGDR: Stochastic Gradient Descent with Warm Restarts
2016I. Loshchilov, F. Hutter
[19]
Visual Storytelling
2016Ting-Hao 'Kenneth' Huang, Francis Ferraro et al.
[20]
Unsupervised Synchronization of Hidden Subtitles with Audio Track Using Keyword Spotting Algorithm
2012P. Stanislav, J. Svec et al.

Showing 20 of 23 references

Founder's Pitch

"A tool for aligning and enhancing visual storytelling with movie script-grounded narrative to reduce hallucination errors."

Dataset CreationScore: 5View PDF ↗

Commercial Viability Breakdown

0-10 scale

High Potential

3/4 signals

7.5

Quick Build

4/4 signals

10

Series A Potential

3/4 signals

7.5

Sources used for this analysis

arXiv Paper

Full-text PDF analysis of the research paper

GitHub Repository

Code availability, stars, and contributor activity

Citation Network

Semantic Scholar citations and co-citation patterns

Community Predictions

Crowd-sourced unicorn probability assessments

Analysis model: GPT-4o · Last scored: 2/25/2026

🔭 Research Neighborhood

Generating constellation...

~3-8 seconds

Why It Matters

This research tackles the common issue in visual storytelling of semantic inconsistency and hallucinations by integrating precise narrative context from movie scripts and subtitles, thereby enhancing the accuracy and authenticity of generated narratives.

Product Angle

The solution can be packaged as an API that film and media production companies integrate into pre- and post-production processes to enhance script consistency and reduce errors, leading to cleaner narrative delivery.

Disruption

This replaces existing manual script editing and continuity management by automating the semantic synchronization of visual and narrative content, minimizing human error.

Product Opportunity

The media and entertainment industry, valued at over $100 billion annually, often faces challenges with script continuity and narrative consistency. Production companies will use this tool to ensure accuracy, thereby saving costs associated with post-production editing due to narrative errors.

Use Case Idea

Develop a script-writing assistant for filmmakers that ensures character interactions and dialogues are portrayed accurately, improving production efficiency in aligning visual scenes with the script.

Science

This study introduces the StoryMovie dataset, which aligns visual storytelling data with movie scripts and subtitles to improve semantic accuracy. Their method synchronizes dialogue from movie scripts with subtitle timing for accurate dialogue attribution, leveraging Longest Common Subsequence (LCS) for token matching. It enhances a storytelling model by grounding stories in detailed context taken directly from scripts, reducing semantic errors by using information beyond visual cues.

Method & Eval

Using the StoryMovie dataset, the model was tested for its semantic alignment capabilities. Evaluation showed improved dialogue attribution and entity re-identification, achieving a 48.5% win rate over models without script grounding.

Caveats

The model's alignment process depends heavily on the quality of available scripts and subtitles, which might not always be accessible for all movies. Furthermore, it is susceptible to misalignment issues in poorly transcribed scripts/subtitles.

Author Intelligence

Daniel Oliveira

LEAD
INESC-ID Lisboa
daniel.oliveira@inesc-id.pt

David Martins de Matos

Instituto Superior Técnico, Universidade de Lisboa
david.matos@inesc-id.pt