Synthetic Data Generation Comparison Hub
7 papers - avg viability 7.1
Recent advancements in synthetic data generation are addressing critical gaps in various sectors, particularly where real data is scarce or encumbered by privacy concerns. For instance, new tools are being developed to create customizable datasets for anti-money laundering research, enabling more effective model training by incorporating both structural and temporal characteristics of illicit transactions. In remote sensing, frameworks are emerging that leverage vision and language models to enhance the interpretability and utility of synthetic data, demonstrating that augmented datasets can outperform those based solely on real images. Similarly, utility companies are utilizing multimodal large language models to generate synthetic defect images for power line inspections, significantly improving classification accuracy in data-scarce environments. Additionally, frameworks are being introduced to ensure fairness in synthetic financial data generation, addressing biases that can skew automated decision-making. These innovations highlight a shift towards more practical, scalable solutions that enhance model performance while mitigating ethical concerns in data usage.
Top Papers
- Grounding Synthetic Data Generation With Vision and Language Models(8.0)
A vision-language framework for interpretable synthetic data generation and evaluation in remote sensing.
- Tide: A Customisable Dataset Generator for Anti-Money Laundering Research(8.0)
Tide provides customizable synthetic datasets for advanced machine learning in anti-money laundering research.
- PersonaTrace: Synthesizing Realistic Digital Footprints with LLM Agents(7.0)
PersonaTrace generates realistic digital footprints using LLM agents to enhance personalized applications and machine learning models.
- Evaluating Synthetic Data for Baggage Trolley Detection in Airport Logistics(7.0)
Generate synthetic data using NVIDIA Omniverse to train a YOLO-OBB model for baggage trolley detection, reducing annotation effort and improving accuracy.
- FairFinGAN: Fairness-aware Synthetic Financial Data Generation(7.0)
FairFinGAN generates synthetic financial data with fairness constraints, mitigating bias in automated financial systems.
- Synthetic Defect Image Generation for Power Line Insulator Inspection Using Multimodal Large Language Models(7.0)
Generate synthetic defect images using MLLMs to improve defect recognition in low-data regimes, offering a practical solution for industries with limited defect data.
- Improving TabPFN's Synthetic Data Generation by Integrating Causal Structure(6.0)
Enhancing synthetic tabular data generation by integrating causal structures into existing models.