Audio Processing

5papers
4.0viability
-33%30d

State of the Field

Recent advancements in audio processing are focusing on enhancing the efficiency and accuracy of speech technologies, addressing both practical and technical challenges. A notable trend is the development of dynamic tokenization methods that allow for variable-frame-rate processing, improving the quality of speech resynthesis while reducing the number of tokens needed. This shift is complemented by innovations in sound source localization, which tackle real-world deployment issues by mitigating imbalances in data distribution, thus enhancing localization accuracy. Additionally, new frameworks for speech bandwidth extension are leveraging neural codecs to restore high-frequency content more effectively, leading to clearer audio transmission. The introduction of shape-gain decomposition in neural audio codecs is also improving bitrate-distortion performance, making these systems more robust and efficient. Collectively, these efforts are poised to solve commercial problems in telecommunications and media by delivering higher quality audio with lower computational costs, ultimately enhancing user experience in various applications.

Last updated Mar 3, 2026

Papers

1–5 of 5
Research Paper·Jan 29, 2026

Unifying Speech Editing Detection and Content Localization via Prior-Enhanced Audio LLMs

Speech editing achieves semantic inversion by performing fine-grained segment-level manipulation on original utterances, while preserving global perceptual naturalness. Existing detection studies main...

7.0 viability
Research Paper·Jan 26, 2026

Analytic Incremental Learning For Sound Source Localization With Imbalance Rectification

Sound source localization (SSL) demonstrates remarkable results in controlled settings but struggles in real-world deployment due to dual imbalance challenges: intra-task imbalance arising from long-t...

5.0 viability
Research Paper·Jan 30, 2026

Beyond Fixed Frames: Dynamic Character-Aligned Speech Tokenization

Neural audio codecs are at the core of modern conversational speech technologies, converting continuous speech into sequences of discrete tokens that can be processed by LLMs. However, existing codecs...

3.0 viability
Research Paper·Mar 2, 2026

CodecFlow: Efficient Bandwidth Extension via Conditional Flow Matching in Neural Codec Latent Space

Speech Bandwidth Extension improves clarity and intelligibility by restoring/inferring appropriate high-frequency content for low-bandwidth speech. Existing methods often rely on spectrogram or wavefo...

3.0 viability
Research Paper·Feb 17, 2026

The Equalizer: Introducing Shape-Gain Decomposition in Neural Audio Codecs

Neural audio codecs (NACs) typically encode the short-term energy (gain) and normalized structure (shape) of speech/audio signals jointly within the same latent space. As a result, they are poorly rob...

2.0 viability