Language Model Optimization

4papers

4.0viability

Papers

1–4 of 4

Research Paper·Feb 19, 2026

Predictive Batch Scheduling: Accelerating Language Model Training Through Loss-Aware Sample Prioritization

We introduce Predictive Batch Scheduling (PBS), a novel training optimization technique that accelerates language model convergence by dynamically prioritizing high-loss samples during batch construct...

5.0 viability

Research Paper·Jan 20, 2026

Lost in the Prompt Order: Revealing the Limitations of Causal Attention in Language Models

Large language models exhibit surprising sensitivity to the structure of the prompt, but the mechanisms underlying this sensitivity remain poorly understood. In this work, we conduct an in-depth inves...

4.0 viability

Research Paper·Jan 29, 2026

Breaking the Overscaling Curse: Thinking Parallelism Before Parallel Thinking

Parallel thinking enhances LLM reasoning by multi-path sampling and aggregation. In system-level evaluations, a global parallelism level N is allocated to all samples, typically set large to maximize ...

4.0 viability

Research Paper·Feb 13, 2026

Can I Have Your Order? Monte-Carlo Tree Search for Slot Filling Ordering in Diffusion Language Models

While plan-and-infill decoding in Masked Diffusion Models (MDMs) shows promise for mathematical and code reasoning, performance remains highly sensitive to slot infilling order, often yielding substan...

3.0 viability