Audio AI

4papers

4.8viability

Papers

1–4 of 4

Research Paper·Jan 28, 2026

PhaseCoder: Microphone Geometry-Agnostic Spatial Audio Understanding for Multimodal LLMs

Current multimodal LLMs process audio as a mono stream, ignoring the rich spatial information essential for embodied AI. Existing spatial audio models, conversely, are constrained to fixed microphone ...

8.0 viability

Research Paper·Feb 27, 2026

AudioCapBench: Quick Evaluation on Audio Captioning across Sound, Music, and Speech

We introduce AudioCapBench, a benchmark for evaluating audio captioning capabilities of large multimodal models. \method covers three distinct audio domains, including environmental sound, music, and ...

5.0 viability

Research Paper·Jan 30, 2026

Towards Explicit Acoustic Evidence Perception in Audio LLMs for Speech Deepfake Detection

Speech deepfake detection (SDD) focuses on identifying whether a given speech signal is genuine or has been synthetically generated. Existing audio large language model (LLM)-based methods excel in co...

3.0 viability

Research Paper·Feb 18, 2026

Spatial Audio Question Answering and Reasoning on Dynamic Source Movements

Spatial audio understanding aims to enable machines to interpret complex auditory scenes, particularly when sound sources move over time. In this work, we study Spatial Audio Question Answering (Spati...