Papers
1–4 of 4AccelAes: Accelerating Diffusion Transformers for Training-Free Aesthetic-Enhanced Image Generation
Diffusion Transformers (DiTs) are a dominant backbone for high-fidelity text-to-image generation due to strong scalability and alignment at high resolutions. However, quadratic self-attention over den...
PixelGen: Pixel Diffusion Beats Latent Diffusion with Perceptual Loss
Pixel diffusion generates images directly in pixel space in an end-to-end manner, avoiding the artifacts and bottlenecks introduced by VAEs in two-stage latent diffusion. However, it is challenging to...
PHAC: Promptable Human Amodal Completion
Conditional image generation methods are increasingly used in human-centric applications, yet existing human amodal completion (HAC) models offer users limited control over the completed content. Give...
SNCE: Geometry-Aware Supervision for Scalable Discrete Image Generation
Recent advancements in discrete image generation showed that scaling the VQ codebook size significantly improves reconstruction fidelity. However, training generative models with a large VQ codebook r...