Model Optimization

16papers

5.1viability

State of the Field

Recent research in model optimization is increasingly focused on enhancing the efficiency and performance of large neural networks, addressing critical challenges in deployment and resource management. Techniques such as Prefill-Only Pruning are demonstrating how stage-aware strategies can significantly reduce computational costs during inference without sacrificing accuracy, while methods like FlashOptim are optimizing memory usage during training, making it feasible to work with larger models on limited hardware. Additionally, the emergence of adaptive frameworks, such as Routing the Lottery, is enabling the discovery of specialized subnetworks tailored to diverse data inputs, thereby improving model performance and reducing parameter counts. Innovations in quantization, exemplified by Quant Experts, are also refining how models handle memory and computational overhead, ensuring that large vision-language models remain effective even under constraints. Collectively, these advancements signal a shift toward more modular, efficient, and context-aware deep learning architectures, poised to meet the demands of real-world applications.

Last updated Mar 5, 2026

Papers

1–10 of 16

Research Paper·Feb 3, 2026·B2B

POP: Prefill-Only Pruning for Efficient Large Model Inference

Large Language Models (LLMs) and Vision-Language Models (VLMs) have demonstrated remarkable capabilities. However, their deployment is hindered by significant computational costs. Existing structured ...

8.0 viability

Research Paper·Jan 30, 2026

Tuning the Implicit Regularizer of Masked Diffusion Language Models: Enhancing Generalization via Insights from $k$-Parity

Masked Diffusion Language Models have recently emerged as a powerful generative paradigm, yet their generalization properties remain understudied compared to their auto-regressive counterparts. In thi...

8.0 viability

Research Paper·Feb 5, 2026·B2B

When Shared Knowledge Hurts: Spectral Over-Accumulation in Model Merging

Model merging combines multiple fine-tuned models into a single model by adding their weight updates, providing a lightweight alternative to retraining. Existing methods primarily target resolving con...

7.0 viability

Research Paper·Jan 29, 2026

Routing the Lottery: Adaptive Subnetworks for Heterogeneous Data

In pruning, the Lottery Ticket Hypothesis posits that large networks contain sparse subnetworks, or winning tickets, that can be trained in isolation to match the performance of their dense counterpar...

7.0 viability

Research Paper·Feb 26, 2026

FlashOptim: Optimizers for Memory Efficient Training

Standard mixed-precision training of neural networks requires many bytes of accelerator memory for each model parameter. These bytes reflect not just the parameter itself, but also its gradient and on...

7.0 viability

Research Paper·Jan 20, 2026

Performance and Complexity Trade-off Optimization of Speech Models During Training

In speech machine learning, neural network models are typically designed by choosing an architecture with fixed layer sizes and structure. These models are then trained to maximize performance on metr...

6.0 viability

Research Paper·Feb 27, 2026

Quant Experts: Token-aware Adaptive Error Reconstruction with Mixture of Experts for Large Vision-Language Models Quantization

Post-Training Quantization (PTQ) has emerged as an effective technique for alleviating the substantial computational and memory overheads of Vision-Language Models (VLMs) by compressing both weights a...

6.0 viability

Research Paper·Mar 2, 2026

SageBwd: A Trainable Low-bit Attention

Low-bit attention, such as SageAttention, has emerged as an effective approach for accelerating model inference, but its applicability to training remains poorly understood. In prior work, we introduc...

5.0 viability

Research Paper·Feb 19, 2026

Sink-Aware Pruning for Diffusion Language Models

Diffusion Language Models (DLMs) incur high inference cost due to iterative denoising, motivating efficient pruning. Existing pruning heuristics largely inherited from autoregressive (AR) LLMs, typica...

5.0 viability

Research Paper·Jan 30, 2026

EUGens: Efficient, Unified, and General Dense Layers

Efficient neural networks are essential for scaling machine learning models to real-time applications and resource-constrained environments. Fully-connected feedforward layers (FFLs) introduce computa...

5.0 viability

Page 1 of 2

Model Optimization

State of the Field

Papers

POP: Prefill-Only Pruning for Efficient Large Model Inference

Tuning the Implicit Regularizer of Masked Diffusion Language Models: Enhancing Generalization via Insights from $k$-Parity

When Shared Knowledge Hurts: Spectral Over-Accumulation in Model Merging

Routing the Lottery: Adaptive Subnetworks for Heterogeneous Data

FlashOptim: Optimizers for Memory Efficient Training

Performance and Complexity Trade-off Optimization of Speech Models During Training

Quant Experts: Token-aware Adaptive Error Reconstruction with Mixture of Experts for Large Vision-Language Models Quantization

SageBwd: A Trainable Low-bit Attention

Sink-Aware Pruning for Diffusion Language Models

EUGens: Efficient, Unified, and General Dense Layers

Filters