ZeSTA: Zero-Shot TTS Augmentation with Domain-Conditioned Training for Data-Efficient Personalized Speech Synthesis

PDF Viewer

BUILDER'S SANDBOX

Build This Paper

Use an AI coding agent to implement this research.

OpenAI Codex
OpenAI CodexAI Agent

Lightweight coding agent in your terminal.

Claude Code
Claude CodeAI Agent

Agentic coding tool for terminal workflows.

AntiGravity IDE
AntiGravity IDEScaffolding

AI agent mindset installer and workflow scaffolder.

Cursor
CursorIDE

AI-first code editor built on VS Code.

VS Code
VS CodeIDE

Free, open-source editor by Microsoft.

Estimated $9K - $13K over 6-10 weeks.

See exactly what it costs to build this -- with 3 comparable funded startups.

7-day free trial. Cancel anytime.

Discover the researchers behind this paper and find similar experts.

7-day free trial. Cancel anytime.

References (38)

[1]
Zero-Shot TTS With Enhanced Audio Prompts: Bsc Submission For The 2026 Wildspoof Challenge TTS Track
2026José Giraldo, Alex Peir'o-Lilja et al.
[2]
Data Augmentation using Speech Synthesis for Speaker-Independent Dysarthria Severity Classification
2025Minseo Kim, Minsu Han et al.
[3]
Advancing Zero-shot Text-to-Speech Intelligibility across Diverse Domains via Preference Alignment
2025Xueyao Zhang, Yuancheng Wang et al.
[4]
Generative Data Augmentation Challenge: Zero-Shot Speech Synthesis for Personalized Speech Enhancement
2025Jae-Sung Bae, Anastasia Kuznetsova et al.
[5]
CosyVoice 2: Scalable Streaming Speech Synthesis with Large Language Models
2024Zhihao Du, Yuxuan Wang et al.
[6]
Fish-Speech: Leveraging Large Language Models for Advanced Multilingual Text-to-Speech Synthesis
2024Shijia Liao, Yuxuan Wang et al.
[7]
Leveraging Low-Rank Adaptation for Parameter-Efficient Fine-Tuning in Multi-Speaker Adaptive Text-to-Speech Synthesis
2024Changi Hong, Jung Hyuk Lee et al.
[8]
Zero Shot Text to Speech Augmentation for Automatic Speech Recognition on Low-Resource Accented Speech Corpora
2023Francesco Nespoli, Daniel Barreda et al.
[9]
Voicebox: Text-Guided Multilingual Universal Speech Generation at Scale
2023Matt Le, Apoorv Vyas et al.
[10]
ADAPTERMIX: Exploring the Efficacy of Mixture of Adapters for Low-Resource TTS Adaptation
2023Ambuj Mehrish, Abhinav Ramesh Kashyap et al.
[11]
Speak, Read and Prompt: High-Fidelity Text-to-Speech with Minimal Supervision
2023E. Kharitonov, Damien Vincent et al.
[12]
Robust Speech Recognition via Large-Scale Weak Supervision
2022Alec Radford, Jong Wook Kim et al.
[13]
Evaluating and reducing the distance between synthetic and real speech distributions
2022Christoph Minixhofer, Ondrej Klejch et al.
[14]
AudioLM: A Language Modeling Approach to Audio Generation
2022Zalán Borsos, Raphaël Marinier et al.
[15]
Low-data? No problem: low-resource, language-agnostic conversational text-to-speech via F0-conditioned data augmentation
2022Giulia Comini, Goeric Huybrechts et al.
[16]
TTS-by-TTS 2: Data-selective augmentation for neural speech synthesis using ranking support vector machine with variational autoencoder
2022Eunwoo Song, Ryuichi Yamamoto et al.
[17]
JETS: Jointly Training FastSpeech2 and HiFi-GAN for End to End Text to Speech
2022D. Lim, Sunghee Jung et al.
[18]
Cross-Speaker Style Transfer for Text-to-Speech Using Data Augmentation
2022M. Ribeiro, J. Roth et al.
[19]
HierSpeech: Bridging the Gap between Text and Speech by Hierarchical Variational Inference using Self-supervised Representations for Speech Synthesis
2022Sang-Hoon Lee, Seung-bin Kim et al.
[20]
YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for everyone
2021Edresson Casanova, Julian Weber et al.

Showing 20 of 38 references

Founder's Pitch

"ZeSTA improves personalized speech synthesis by enhancing speaker similarity through zero-shot TTS augmentation and domain-conditioned training."

Text-to-SpeechScore: 6View PDF ↗

Commercial Viability Breakdown

0-10 scale

High Potential

1/4 signals

2.5

Quick Build

3/4 signals

7.5

Series A Potential

2/4 signals

5

Sources used for this analysis

arXiv Paper

Full-text PDF analysis of the research paper

GitHub Repository

Code availability, stars, and contributor activity

Citation Network

Semantic Scholar citations and co-citation patterns

Community Predictions

Crowd-sourced unicorn probability assessments

Analysis model: GPT-4o · Last scored: 3/4/2026

Explore the full citation network and related research.

7-day free trial. Cancel anytime.

Understand the commercial significance and market impact.

7-day free trial. Cancel anytime.

Get detailed profiles of the research team.

7-day free trial. Cancel anytime.

Related Papers

Loading…