Multimodal LLMs

Trending

4papers

6.5viability

+100%30d

Papers

1–4 of 4

Research Paper·Mar 19, 2026

Multimodal Task Interference: A Benchmark and Analysis of History-Target Mismatch in Multimodal LLMs

Task interference, the performance degradation caused by task switches within a single conversation, has been studied exclusively in text-only settings despite the growing prevalence of multimodal dia...

7.0 viability

Research Paper·Mar 22, 2026

Uncertainty-Aware Knowledge Distillation for Multimodal Large Language Models

Knowledge distillation establishes a learning paradigm that leverages both data supervision and teacher guidance. However, determining the optimal balance between learning from data and learning from ...

7.0 viabilityHas code

Research Paper·Mar 10, 2026

Reading, Not Thinking: Understanding and Bridging the Modality Gap When Text Becomes Pixels in Multimodal LLMs

Multimodal large language models (MLLMs) can process text presented as images, yet they often perform worse than when the same content is provided as textual tokens. We systematically diagnose this "m...

7.0 viability

Research Paper·Feb 26, 2026

Modality Collapse as Mismatched Decoding: Information-Theoretic Limits of Multimodal LLMs

Multimodal LLMs can process speech and images, but they cannot hear a speaker's voice or see an object's texture. We show this is not a failure of encoding: speaker identity, emotion, and visual attri...

5.0 viability