Speech Processing Comparison Hub
4 papers - avg viability 4.8
Top Papers
- Learning Multiple Utterance-Level Attribute Representations with a Unified Speech Encoder(7.0)
A unified post-training framework that enables a single speech foundation model to generate multiple types of utterance-level representations, enabling effective multimodal and multilingual applications.
- StyleStream: Real-Time Zero-Shot Voice Style Conversion(6.0)
StyleStream enables real-time zero-shot voice style conversion across timbre, accent, and emotion.
- The Cascade Equivalence Hypothesis: When Do Speech LLMs Behave Like ASR$\rightarrow$LLM Pipelines?(3.0)
Research analyzes when speech LLMs act similarly to ASR-to-LLM pipelines, highlighting architectural dependencies.
- AlphaFlowTSE: One-Step Generative Target Speaker Extraction via Conditional AlphaFlow(3.0)
AlphaFlowTSE is a one-step generative model for target speaker extraction that enhances speech fidelity from multi-talker mixtures.