Audio AI Comparison Hub
5 papers - avg viability 5.2
Top Papers
- PhaseCoder: Microphone Geometry-Agnostic Spatial Audio Understanding for Multimodal LLMs(8.0)
PhaseCoder allows any device to perform spatial audio reasoning and transcription using a microphone-agnostic transformer-based encoder.
- Are Audio-Language Models Listening? Audio-Specialist Heads for Adaptive Audio Steering(7.0)
Amplify audio understanding in large audio-language models by identifying and steering audio-specialist attention heads, improving accuracy without retraining.
- AudioCapBench: Quick Evaluation on Audio Captioning across Sound, Music, and Speech(5.0)
AudioCapBench provides a comprehensive benchmark for rapidly evaluating audio captioning models across different audio domains.
- Towards Explicit Acoustic Evidence Perception in Audio LLMs for Speech Deepfake Detection(3.0)
Develop an acoustically enhanced framework to improve speech deepfake detection by exposing fine-grained time-frequency evidence.
- Spatial Audio Question Answering and Reasoning on Dynamic Source Movements(3.0)
Develop a spatial audio question answering system focusing on dynamic source movements.