Visual Reasoning Comparison Hub

VisDoT enhances visual reasoning in charts through human-like interpretation and decomposition of thought.

CodePercept enhances visual reasoning in STEM for MLLMs by leveraging executable code as a perceptual medium.

Develop a visual reasoning enhancement tool for VLMs that leverages contrastive learning to improve accuracy.

MM-CondChain is a benchmark for evaluating visually grounded deep compositional reasoning in multimodal large language models.

Reference Surfaces