Synthetic Data Generation

Trending

7papers

7.1viability

+100%30d

State of the Field

Recent advancements in synthetic data generation are addressing critical gaps in various sectors, particularly where real data is scarce or encumbered by privacy concerns. For instance, new tools are being developed to create customizable datasets for anti-money laundering research, enabling more effective model training by incorporating both structural and temporal characteristics of illicit transactions. In remote sensing, frameworks are emerging that leverage vision and language models to enhance the interpretability and utility of synthetic data, demonstrating that augmented datasets can outperform those based solely on real images. Similarly, utility companies are utilizing multimodal large language models to generate synthetic defect images for power line inspections, significantly improving classification accuracy in data-scarce environments. Additionally, frameworks are being introduced to ensure fairness in synthetic financial data generation, addressing biases that can skew automated decision-making. These innovations highlight a shift towards more practical, scalable solutions that enhance model performance while mitigating ethical concerns in data usage.

Last updated Mar 11, 2026

Papers

1–7 of 7

Research Paper·Mar 10, 2026

Grounding Synthetic Data Generation With Vision and Language Models

Deep learning models benefit from increasing data diversity and volume, motivating synthetic data augmentation to improve existing datasets. However, existing evaluation metrics for synthetic data typ...

8.0 viability

Research Paper·Mar 2, 2026

Tide: A Customisable Dataset Generator for Anti-Money Laundering Research

The lack of accessible transactional data significantly hinders machine learning research for Anti-Money Laundering (AML). Privacy and legal concerns prevent the sharing of real financial data, while ...

8.0 viability

Research Paper·Mar 12, 2026

PersonaTrace: Synthesizing Realistic Digital Footprints with LLM Agents

Digital footprints (records of individuals' interactions with digital systems) are essential for studying behavior, developing personalized applications, and training machine learning models. However,...

7.0 viability

Research Paper·Mar 8, 2026

Evaluating Synthetic Data for Baggage Trolley Detection in Airport Logistics

Efficient luggage trolley management is critical for reducing congestion and ensuring asset availability in modern airports. Automated detection systems face two main challenges. First, strict securit...

7.0 viability

Research Paper·Mar 5, 2026

FairFinGAN: Fairness-aware Synthetic Financial Data Generation

Financial datasets often suffer from bias that can lead to unfair decision-making in automated systems. In this work, we propose FairFinGAN, a WGAN-based framework designed to generate synthetic finan...

7.0 viability

Research Paper·Mar 9, 2026

Synthetic Defect Image Generation for Power Line Insulator Inspection Using Multimodal Large Language Models

Utility companies increasingly rely on drone imagery for post-event and routine inspection, but training accurate defect-type classifiers remains difficult because defect examples are rare and inspect...

7.0 viability

Research Paper·Mar 10, 2026

Improving TabPFN's Synthetic Data Generation by Integrating Causal Structure

Synthetic tabular data generation addresses data scarcity and privacy constraints in a variety of domains. Tabular Prior-Data Fitted Network (TabPFN), a recent foundation model for tabular data, has b...

6.0 viability

Synthetic Data Generation

State of the Field

Papers

Grounding Synthetic Data Generation With Vision and Language Models

Tide: A Customisable Dataset Generator for Anti-Money Laundering Research

PersonaTrace: Synthesizing Realistic Digital Footprints with LLM Agents

Evaluating Synthetic Data for Baggage Trolley Detection in Airport Logistics

FairFinGAN: Fairness-aware Synthetic Financial Data Generation

Synthetic Defect Image Generation for Power Line Insulator Inspection Using Multimodal Large Language Models

Improving TabPFN's Synthetic Data Generation by Integrating Causal Structure

Filters