LLM Training

64papers
4.2viability

State of the Field

Recent advancements in large language model (LLM) training are focusing on enhancing efficiency and interpretability while addressing the complexities of reasoning. Techniques such as active distillation frameworks and language-specific model merging are streamlining the training process, significantly reducing computational costs and improving performance under constrained annotation budgets. Researchers are also exploring innovative reinforcement learning paradigms, like self-feedback-driven approaches and credit assignment mechanisms, to refine the reasoning capabilities of LLMs. These methods aim to better align model training with human cognitive processes, enabling more reliable and generalizable outputs. The shift towards conflict-aware data selection and the understanding of hidden dataset effects further underscores the field's commitment to optimizing training dynamics and outcomes. Collectively, these developments hold promise for commercial applications, particularly in industries requiring efficient and accurate natural language processing solutions, such as customer service automation and content generation.

Last updated Feb 28, 2026

Papers

1–10 of 50
Research Paper·Feb 3, 2026

Distilling LLM Reasoning into Graph of Concept Predictors

Deploying Large Language Models (LLMs) for discriminative workloads is often limited by inference latency, compute, and API costs at scale. Active distillation reduces these costs by querying an LLM o...

8.0 viability
Research Paper·Mar 2, 2026

KDFlow: A User-Friendly and Efficient Knowledge Distillation Framework for Large Language Models

Knowledge distillation (KD) is an essential technique to compress large language models (LLMs) into smaller ones. However, despite the distinct roles of the student model and the teacher model in KD, ...

8.0 viability
Research Paper·Jan 22, 2026

Improving Training Efficiency and Reducing Maintenance Costs via Language Specific Model Merging

Fine-tuning a task-specific multilingual large language model (LLM) involves training the model on a multilingual dataset with examples in all the required languages. Updating one or more supported la...

7.0 viability
Research Paper·Feb 4, 2026·B2BEducation

Rethinking the Trust Region in LLM Reinforcement Learning

Reinforcement learning (RL) has become a cornerstone for fine-tuning Large Language Models (LLMs), with Proximal Policy Optimization (PPO) serving as the de facto standard algorithm. Despite its ubiqu...

7.0 viability
Research Paper·Jan 20, 2026

InT: Self-Proposed Interventions Enable Credit Assignment in LLM Reasoning

Outcome-reward reinforcement learning (RL) has proven effective at improving the reasoning capabilities of large language models (LLMs). However, standard RL assigns credit only at the level of the fi...

7.0 viability
Research Paper·Jan 29, 2026

From Meta-Thought to Execution: Cognitively Aligned Post-Training for Generalizable and Reliable LLM Reasoning

Current LLM post-training methods optimize complete reasoning trajectories through Supervised Fine-Tuning (SFT) followed by outcome-based Reinforcement Learning (RL). While effective, a closer examina...

7.0 viability
Research Paper·Feb 9, 2026

iGRPO: Self-Feedback-Driven LLM Reasoning

Large Language Models (LLMs) have shown promise in solving complex mathematical problems, yet they still fall short of producing accurate and consistent solutions. Reinforcement Learning (RL) is a fra...

6.0 viability
Research Paper·Jan 30, 2026

SPICE: Submodular Penalized Information-Conflict Selection for Efficient Large Language Model Training

Information-based data selection for instruction tuning is compelling: maximizing the log-determinant of the Fisher information yields a monotone submodular objective, enabling greedy algorithms to ac...

6.0 viability
Research Paper·Jan 30, 2026

A Step Back: Prefix Importance Ratio Stabilizes Policy Optimization

Reinforcement learning (RL) post-training has increasingly demonstrated strong ability to elicit reasoning behaviors in large language models (LLMs). For training efficiency, rollouts are typically ge...

6.0 viability
Research Paper·Mar 2, 2026

Learning from Synthetic Data Improves Multi-hop Reasoning

Reinforcement Learning (RL) has been shown to significantly boost reasoning capabilities of large language models (LLMs) in math, coding, and multi-hop reasoning tasks. However, RL fine-tuning require...

6.0 viability
Page 1 of 5