Recent advancements in large language model (LLM) training are focusing on enhancing efficiency and interpretability while addressing the complexities of reasoning. Techniques such as active distillation frameworks and language-specific model merging are streamlining the training process, significantly reducing computational costs and improving performance under constrained annotation budgets. Researchers are also exploring innovative reinforcement learning paradigms, like self-feedback-driven approaches and credit assignment mechanisms, to refine the reasoning capabilities of LLMs. These methods aim to better align model training with human cognitive processes, enabling more reliable and generalizable outputs. The shift towards conflict-aware data selection and the understanding of hidden dataset effects further underscores the field's commitment to optimizing training dynamics and outcomes. Collectively, these developments hold promise for commercial applications, particularly in industries requiring efficient and accurate natural language processing solutions, such as customer service automation and content generation.
Top papers
- KDFlow: A User-Friendly and Efficient Knowledge Distillation Framework for Large Language Models(8.0)
- Distilling LLM Reasoning into Graph of Concept Predictors(8.0)
- Improving Training Efficiency and Reducing Maintenance Costs via Language Specific Model Merging(7.0)
- InT: Self-Proposed Interventions Enable Credit Assignment in LLM Reasoning(7.0)
- Rethinking the Trust Region in LLM Reinforcement Learning(7.0)
- From Meta-Thought to Execution: Cognitively Aligned Post-Training for Generalizable and Reliable LLM Reasoning(7.0)
- A Step Back: Prefix Importance Ratio Stabilizes Policy Optimization(6.0)
- Learning from Synthetic Data Improves Multi-hop Reasoning(6.0)
- Subliminal Effects in Your Data: A General Mechanism via Log-Linearity(6.0)
- iGRPO: Self-Feedback-Driven LLM Reasoning(6.0)
- SPICE: Submodular Penalized Information-Conflict Selection for Efficient Large Language Model Training(6.0)
- Mano: Restriking Manifold Optimization for LLM Training(5.0)
- Semantic-aware Wasserstein Policy Regularization for Large Language Model Alignment(5.0)
- Next Concept Prediction in Discrete Latent Space Leads to Stronger Language Models(5.0)
- SpanNorm: Reconciling Training Stability and Performance in Deep Transformers(5.0)
- P-EAGLE: Parallel-Drafting EAGLE with Scalable Training(5.0)
- Rethinking Reinforcement fine-tuning of LLMs: A Multi-armed Bandit Learning Perspective(5.0)
- ArXiv-to-Model: A Practical Study of Scientific LM Training(5.0)
- Self-Improving Pretraining: using post-trained models to pretrain better models(5.0)
- Preference Packing: Efficient Preference Optimization for Large Language Models(5.0)
- Decoder-based Sense Knowledge Distillation(5.0)
- VESPO: Variational Sequence-Level Soft Policy Optimization for Stable Off-Policy LLM Training(5.0)
- Towards On-Policy SFT: Distribution Discriminant Theory and its Applications in LLM Training(5.0)
- Group Distributionally Robust Optimization-Driven Reinforcement Learning for LLM Reasoning(5.0)
- Reinforcement-aware Knowledge Distillation for LLM Reasoning(5.0)
- Overconfident Errors Need Stronger Correction: Asymmetric Confidence Penalties for Reinforcement Learning(5.0)
- PRL: Process Reward Learning Improves LLMs' Reasoning Ability and Broadens the Reasoning Boundary(5.0)
- YuriiFormer: A Suite of Nesterov-Accelerated Transformers(5.0)
- Entropy-Gated Selective Policy Optimization:Token-Level Gradient Allocation for Hybrid Training of Large Language Models(5.0)
- STAPO: Stabilizing Reinforcement Learning for LLMs by Silencing Rare Spurious Tokens(5.0)
- Beyond Normalization: Rethinking the Partition Function as a Difficulty Scheduler for RLVR(5.0)
- Anytime Pretraining: Horizon-Free Learning-Rate Schedules with Weight Averaging(4.0)
- A Scalable Measure of Loss Landscape Curvature for Analyzing the Training Dynamics of LLMs(4.0)
- Weight Decay Improves Language Model Plasticity(4.0)
- Advancing General-Purpose Reasoning Models with Modular Gradient Surgery(4.0)
- Multi-Task GRPO: Reliable LLM Reasoning Across Tasks(4.0)
- JPmHC Dynamical Isometry via Orthogonal Hyper-Connections(4.0)
- ARO: A New Lens On Matrix Optimization For Large Models(4.0)
- Provable Last-Iterate Convergence for Multi-Objective Safe LLM Alignment via Optimistic Primal-Dual(3.0)
- ECO: Quantized Training without Full-Precision Master Weights(3.0)
- Training Reasoning Models on Saturated Problems via Failure-Prefix Conditioning(3.0)
- Scaling Embeddings Outperforms Scaling Experts in Language Models(3.0)
- Curriculum Learning for LLM Pretraining: An Analysis of Learning Dynamics(3.0)
- State Rank Dynamics in Linear Attention LLMs(3.0)
- TEON: Tensorized Orthonormalization Beyond Layer-Wise Muon for Large Language Model Pre-Training(3.0)
- CoSA: Compressed Sensing-Based Adaptation of Large Language Models(3.0)
- Distributed Hybrid Parallelism for Large Language Models: Comparative Study and System Design Guide(3.0)
- On Surprising Effectiveness of Masking Updates in Adaptive Optimizers(3.0)
- From Growing to Looping: A Unified View of Iterative Computation in LLMs(3.0)
- Test-Time Meta-Adaptation with Self-Synthesis(3.0)