Text-to-Speech Comparison Hub
7 papers - avg viability 6.9
Top Papers
- Fish Audio S2 Technical Report(9.0)
Fish Audio S2 is an open-sourced text-to-speech system that enables multi-speaker, instruction-following audio generation.
- DeepASMR: LLM-Based Zero-Shot ASMR Speech Generation for Anyone of Any Voice(8.0)
DeepASMR enables anyone to synthesize zero-shot ASMR speech from ordinary samples, leveraging a new dataset and advanced LLM techniques.
- Causal Prosody Mediation for Text-to-Speech:Counterfactual Training of Duration, Pitch, and Energy in FastSpeech2(7.0)
A novel framework for expressive text-to-speech synthesis that enhances emotional prosody control using causal learning principles.
- Bolbosh: Script-Aware Flow Matching for Kashmiri Text-to-Speech(7.0)
Bolbosh is an open-source neural TTS system for Kashmiri, addressing the lack of speech technology for this underserved language.
- Learning-free L2-Accented Speech Generation using Phonological Rules(7.0)
Generate accented speech without accented training data by combining phonological rules with a multilingual TTS model, enabling fine-grained accent control.
- ZeSTA: Zero-Shot TTS Augmentation with Domain-Conditioned Training for Data-Efficient Personalized Speech Synthesis(6.0)
ZeSTA improves personalized speech synthesis by enhancing speaker similarity through zero-shot TTS augmentation and domain-conditioned training.
- When Fine-Tuning Fails and when it Generalises: Role of Data Diversity and Mixed Training in LLM-based TTS(4.0)
Improving voice cloning in TTS systems through effective LoRA fine-tuning of LLMs.