Multimodal Reasoning Comparison Hub
5 papers - avg viability 4.4
Top Papers
- Think While Watching: Online Streaming Segment-Level Memory for Multi-Turn Video Reasoning in Multimodal Large Language Models(9.0)
A memory-anchored framework for real-time multi-turn video reasoning in multimodal large language models.
- M$^3$-ACE: Rectifying Visual Perception in Multimodal Math Reasoning via Multi-Agentic Context Engineering(8.0)
M3-ACE is a multi-agent system that improves visual math reasoning by rectifying visual perception, achieving state-of-the-art results and offering a clear path to commercial applications.
- BRIDGE: Benchmark for multi-hop Reasoning In long multimodal Documents with Grounded Evidence(7.0)
BRIDGE is a benchmark dataset for evaluating multi-hop reasoning in long multimodal documents, enabling targeted diagnosis of reasoning failures in LLMs and RAG systems.
- ClueTracer: Question-to-Vision Clue Tracing for Training-Free Hallucination Suppression in Multimodal Reasoning(7.0)
ClueTracer enhances multimodal reasoning models by suppressing hallucinations without additional training.
- Fuel Gauge: Estimating Chain-of-Thought Length Ahead of Time in Large Multimodal Models(7.0)
Fuel Gauge predicts Chain-of-Thought length in multimodal models to optimize resource use and accuracy.
- Concise Geometric Description as a Bridge: Unleashing the Potential of LLM for Plane Geometry Problem Solving(6.0)
Enhance plane geometry problem solving by converting visual diagrams into concise textual descriptions for LLMs.
- GeoSense: Internalizing Geometric Necessity Perception for Multimodal Reasoning(6.0)
GeoSense enhances multimodal reasoning by autonomously engaging geometric features based on perceptual necessity.
- Evolving from Tool User to Creator via Training-Free Experience Reuse in Multimodal Reasoning(5.0)
Create adaptive, self-updating multimodal reasoning tools without training through a unique experience reuse framework.
- Omni-R1: Towards the Unified Generative Paradigm for Multimodal Reasoning(2.0)
Unified generative paradigm for diverse multimodal reasoning tasks.
- Deconstructing Multimodal Mathematical Reasoning: Towards a Unified Perception-Alignment-Reasoning Paradigm(2.0)
Develops a framework for integrating perception, alignment, and reasoning to improve multimodal mathematical reasoning models.