State of the Field
Recent advancements in generative AI are increasingly focused on enhancing the coherence and quality of outputs across various modalities, particularly in text-to-image generation and image editing. A notable trend is the integration of reasoning frameworks that unify generation and editing tasks, allowing for more sophisticated visual synthesis that mimics human cognitive processes. Techniques such as dynamic training-free fusion of subject and style representations are emerging, enabling more flexible and contextually relevant outputs without extensive retraining. Additionally, methods aimed at concept erasure are being refined to mitigate the risks of misuse, ensuring that generative models can produce safe and appropriate content. The exploration of sparsely supervised learning strategies is also gaining traction, addressing issues of spatial consistency in generated images. Collectively, these developments signal a shift towards more robust, interpretable, and ethically responsible generative AI systems, with significant implications for industries ranging from entertainment to advertising.
Papers
1–10 of 11UniReason 1.0: A Unified Reasoning Framework for World Knowledge Aligned Image Generation and Editing
Unified multimodal models often struggle with complex synthesis tasks that demand deep reasoning, and typically treat text-to-image generation and image editing as isolated capabilities rather than in...
Training-Free Representation Guidance for Diffusion Models with a Representation Alignment Projector
Recent progress in generative modeling has enabled high-quality visual synthesis with diffusion-based frameworks, supporting controllable sampling and large-scale training. Inference-time guidance met...
CoF-T2I: Video Models as Pure Visual Reasoners for Text-to-Image Generation
Recent video generation models have revealed the emergence of Chain-of-Frame (CoF) reasoning, enabling frame-by-frame visual inference. With this capability, video models have been successfully applie...
Localized Concept Erasure in Text-to-Image Diffusion Models via High-Level Representation Misdirection
Recent advances in text-to-image (T2I) diffusion models have seen rapid and widespread adoption. However, their powerful generative capabilities raise concerns about potential misuse for synthesizing ...
Dynamic Training-Free Fusion of Subject and Style LoRAs
Recent studies have explored the combination of multiple LoRAs to simultaneously generate user-specified subjects and styles. However, most existing approaches fuse LoRA weights using static statistic...
Sparsely Supervised Diffusion
Diffusion models have shown remarkable success across a wide range of generative tasks. However, they often suffer from spatially inconsistent generation, arguably due to the inherent locality of thei...
PromptSplit: Revealing Prompt-Level Disagreement in Generative Models
Prompt-guided generative AI models have rapidly expanded across vision and language domains, producing realistic and diverse outputs from textual inputs. The growing variety of such models, trained wi...
Generative AI collective behavior needs an interactionist paradigm
In this article, we argue that understanding the collective behavior of agents based on large language models (LLMs) is an essential area of inquiry, with important implications in terms of risks and ...
MoVE: Mixture of Value Embeddings -- A New Axis for Scaling Parametric Memory in Autoregressive Models
Autoregressive sequence modeling stands as the cornerstone of modern Generative AI, powering results across diverse modalities ranging from text generation to image generation. However, a fundamental ...
Multi-Task Learning with Additive U-Net for Image Denoising and Classification
We investigate additive skip fusion in U-Net architectures for image denoising and denoising-centric multi-task learning (MTL). By replacing concatenative skips with gated additive fusion, the propose...