Multimodal Models

4papers

6.0viability

Papers

1–4 of 4

Research Paper·Jan 23, 2026

Cognitively-Inspired Tokens Overcome Egocentric Bias in Multimodal Models

Multimodal language models (MLMs) perform well on semantic vision-language tasks but fail at spatial reasoning that requires adopting another agent's visual perspective. These errors reflect a persist...

8.0 viability

Research Paper·Feb 17, 2026

Understanding vs. Generation: Navigating Optimization Dilemma in Multimodal Models

Current research in multimodal models faces a key challenge where enhancing generative capabilities often comes at the expense of understanding, and vice versa. We analyzed this trade-off and identify...

6.0 viability

Research Paper·Jan 27, 2026

Innovator-VL: A Multimodal Large Language Model for Scientific Discovery

We present Innovator-VL, a scientific multimodal large language model designed to advance understanding and reasoning across diverse scientific domains while maintaining excellent performance on gener...

5.0 viability

Research Paper·Feb 27, 2026

Pseudo Contrastive Learning for Diagram Comprehension in Multimodal Models

Recent multimodal models such as Contrastive Language-Image Pre-training (CLIP) have shown remarkable ability to align visual and linguistic representations. However, domains where small visual differ...