Audio AI Comparison Hub

PhaseCoder: Microphone Geometry-Agnostic Spatial Audio Understanding for Multimodal LLMs(8.0)

PhaseCoder allows any device to perform spatial audio reasoning and transcription using a microphone-agnostic transformer-based encoder.

Are Audio-Language Models Listening? Audio-Specialist Heads for Adaptive Audio Steering(7.0)

Amplify audio understanding in large audio-language models by identifying and steering audio-specialist attention heads, improving accuracy without retraining.

AudioCapBench: Quick Evaluation on Audio Captioning across Sound, Music, and Speech(5.0)

AudioCapBench provides a comprehensive benchmark for rapidly evaluating audio captioning models across different audio domains.

Towards Explicit Acoustic Evidence Perception in Audio LLMs for Speech Deepfake Detection(3.0)

Develop an acoustically enhanced framework to improve speech deepfake detection by exposing fine-grained time-frequency evidence.

Spatial Audio Question Answering and Reasoning on Dynamic Source Movements(3.0)

Develop a spatial audio question answering system focusing on dynamic source movements.

Reference Surfaces

Top Papers