Speech Processing Comparison Hub

Learning Multiple Utterance-Level Attribute Representations with a Unified Speech Encoder(7.0)

A unified post-training framework that enables a single speech foundation model to generate multiple types of utterance-level representations, enabling effective multimodal and multilingual applications.

StyleStream: Real-Time Zero-Shot Voice Style Conversion(6.0)

StyleStream enables real-time zero-shot voice style conversion across timbre, accent, and emotion.

The Cascade Equivalence Hypothesis: When Do Speech LLMs Behave Like ASR$\rightarrow$LLM Pipelines?(3.0)

Research analyzes when speech LLMs act similarly to ASR-to-LLM pipelines, highlighting architectural dependencies.

AlphaFlowTSE: One-Step Generative Target Speaker Extraction via Conditional AlphaFlow(3.0)

AlphaFlowTSE is a one-step generative model for target speaker extraction that enhances speech fidelity from multi-talker mixtures.

Speech Processing Comparison Hub

Reference Surfaces

Top Papers